Miguel A. Cabrera Minagorri

2023-11-21

Navigating computer vision development

Giving computers the ability to interpret and respond to visual data is becoming the next industrial revolution. This is in the big picture thanks to the advances in computer vision models, that are able to recognize patterns on images.

However, the journey from conceptualizing a computer vision application to deploying it in the real world has plenty of challenges. There are more and more available open-source computer vision models that greatly simplify the process, but still, this journey extends beyond model creation and training.

What a computer vision application requires.

At least, deploying a real-world computer vision application requires you to create some multimedia pipelines. These pipelines are in charge of ingesting the video streams, including demuxing, decoding, and processing the frames. Then, sometimes, you also need to mux and re-encode to generate an output video.

The computer vision model itself is used during the processing step, where it takes some input data from an image and produces some output data. Before passing the input data to the model it is common to pre-process the images. Typical preprocessing includes normalizing pixels, resizing, changing the color space, etc. Luckily, there are already great libraries that provide algorithms and functions for the most common pre-processing steps. Finally, it is required to post-process the model output. The post-processing allows us to take some actions based on the patterns the model identified. What to do in the post-processing is defined by the particular use case and the model output format. In many real-world applications, you usually have the structure (pre-process, process, post-process) chained several times. That chain allows to pass the image data through several different models that are able to identify different patterns.

All the described above must happen in just a few milliseconds to run in real-time without delay. For example, in a 60 FPS input stream, you have 16 milliseconds (1/60) to process each frame. This conforms to a challenging task, and, to accomplish it, we need to use parallel processing. In general, you will process more than one frame simultaneously, which requires careful thinking about how all the above is designed.

Finally, it is also required to manage the streams. A stream can stop and restart, disconnect, etc. And in many cases, processing several streams at the same time is required. This not only makes it more complex to maintain real-time processing but also includes other problems to your application like handling stream disconnections or restarts, possible multiple sources, possible multiple outputs, etc.

Traditional approaches

There have typically been 2 main approaches when developing a computer vision application:

  • Building from scratch: it implies carefully designing every step described above, from multimedia pipelines to the parallelization of the code and stream management. This is a time-consuming and error-prone task that also requires a strong investment in maintenance.
  • Cloud APIs: using cloud based APIs you send images to a remote endpoint that will execute the inference on the cloud. This has the advantage that you don’t need to have the hardware, however, this kind of solution is not valid for all applications. The main problem they present is latency. They limit the performance of your application since they add network delay to all the steps mentioned above. Also, you still need to create a bunch of code to manage the streams in your application and parallelize the code for pre and post-processing as well as the API calls. Also, the devices must always be connected to the internet and consume considerable bandwidth.

Pipeless alternative

Recently, a new alternative called Pipeless has appeared. Pipeless is an open-source framework that focuses on providing a great development experience and out-of-the-box performance. It offers a really easy stream management allowing you to add, edit, and remove streams on the fly as well as processing multiple streams. Furthermore, it offers you the possibility to deploy the applications either to the cloud or directly to embedded or edge devices.

From the user point of view, you just need to provide specific functions of your use case that receive frames and Pipeless takes care of everything else. For example, you can provide a 10-line function to Pipeless that draws bounding boxes in the frame from the model output data, or you can provide a pre-processing function that takes the frame and converts it into the model input format. You can then take those functions and deploy them with Pipeless anywhere.

Finally, Pipeless also allows scheduling the model for inference into CPU and GPU, including OpenVINO, CUDA, TensorRT, and CoreML among others.

If you are interested in Pipeless, consider starring the GitHub repository and joining the community group to participate and continue improving it.

Subscribe to our newsletter!

Receive new tutorials, community highlights, release notes, and more!

Copyright © 2023 Pipeless, Inc.