Deploying an object detection application to the cloud using Kubernetes and Helm

When talking about deploying computer vision applications we have 3 main options:

Edge deployments: in a simplified way, you could think about an edge deployment as performing all the video processing directly on the same device that contains the camera. Deploying to the edge has many advantages, like very low latency or high privacy. However, it also has some drawbacks like the management complexity or limited resources of the device.
Cloud deployments: a cloud deployment means to send the video stream from the device containing the camera to the cloud, where it will be processed. In cloud deployment we usually have much more resources to process the streams, however, it introduces some latency to send the stream and receive the results. Also, the device needs to have a stable internet connection.
Hybrid deployments: Hybrid deployments try to combine the best of both worlds. For example, if the application does not require a very low latency, you could make some small processing on the edge and then send the stream to the cloud to perform a more exhaustive processing.However, following this approach is also more complex that just following one of the two mentioned above.

Deciding between the mentioned approaches strongly depends on the use case, but we will talk about this in a different post. Today, we are going to go deeper in how to deploy your application to the cloud, specifically, using Kubernetes and Helm.

The application we will deploy is the one from this step by step guide. To refresh your memory, this application is able to perform basic object detection on any input stream. It uses a YOLOv8 model loaded into the ONNX Runtime.

The application was created with Pipeless. In case you don’t know, Pipeless is an open-source framework that allows you to create and deploy computer vision applications in just minutes.

You can learn more about Pipeless in the documentation and about how it integrates the ONNX Runtime in this previous post.

Deploying the application

Getting a Kubernetes Cluster

Before deploying our application with Helm, we need a Kubernetes cluster. There are many options to deploy a Kubernetes cluster. You can use Minikube to do it locally, K3s or you can create a cluster in AWS, Azure or Google Cloud.

An option that we find fairly simple is to create one in AWS using the eksctl CLI. For example, you can run:

eksctl create cluster --name my-cluster --fargate

There are so many ways of deploying a cluster that we will leave this step up to you.

Deploying the application with Helm

Once you have a Kuberentes cluster you are ready to deploy the application. For the deployment, we are going to use Helm, which is known as the package manager for Kubernetes.

Ensure you have the Helm CLI installed.

Pipeless provides the Pipeless Helm Chart, which contains all the automation to load and run the application out of the box. The advantage of using Helm is that we can deploy as many Pipeless applications as we want, even several instances of the same application, and they won’t conflict with each other.

To make it even easier, the Pipeless Helm Chart also contains an RTMP server by default. This RTMP server allows you to inject video and to see the output in streaming. Once you install the Helm chart, the installation output will show you the exact commands you need to run for sending and visualizing the streams.

We have not yet published the Pipeless Helm Chart to a registry, so simply clone the Pipeless repository and move to the package/helm-chart directory:

git clone https://github.com/pipeless-ai/pipeless
cd pipeless/package/helm-chart

The Pipeless Helm chart requires a few inputs:

A URL to the application code repository. After installing the Helm chart, the first step that each worker will run is to download your application code and load it into Pipeless. In this particular example, the application code is at the main Pipeless repository, so we provide the git URL plus a subPath indicating the directory where the application is located within the repository.
A URI to the ONNX model file. Pipeless also downloads your model file on the fly and loads it into the ONNX Runtime before starting. In this particular example, the ONNX model file is contained within the same application directory, so we provide an URI pointing to the local file within the container. You will see in the installation command that it starts with file://, however, this could be any URI, including an HTTP URL.
Optionally, the workers number and the plugins execution order. To deploy more than one worker and allow our application to perform faster processing we will specify the number of workers to deploy. Also, if you remember the previous posts, the application we are going to deploy uses the drawing plugin to draw the bounding boxes. We also have to specify the plugin in the plugins execution order.

Let's deploy it!. Execute the following command, which contains the options described above:

helm install pipeless . --set worker.application.git_repo="https://github.com/pipeless-ai/pipeless.git",worker.application.subPath="examples/onnx-yolo",worker.plugins.order="draw",worker.inference.model_uri="file:///app/yolov8n.onnx" --set worker.replicaCount=4

And you should be able to see something similar to the following, indicating that your deployment is ready to be used:

NAME: pipeless
LAST DEPLOYED: Fri Oct  6 18:41:56 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Pipeless was deployed!

You can now start processing video.

1. Send an input video stream from your webcam via the RTMP proxy using the following commands:

** Please ensure an external IP is associated to the pipeless-proxy service before proceeding **
** Watch the status using: kubectl get svc --namespace default -w  pipeless-proxy **

  export SERVICE_IP=$(kubectl get svc --namespace default pipeless-proxy --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
  export URL="rtmp://${SERVICE_IP}:1935/pipeless/input"
  echo "RTMP server input URL: rtmp://$SERVICE_IP:1935/pipeless/input"

  ffmpeg -re -i /dev/video0 -c:v libx264 -preset ultrafast -tune zerolatency -c:a aac -f flv "$URL"

  Feel free to change /dev/video0 by a video file path.

2. Read the output from the RTMP proxy with the following command:

   mpv "rtmp://localhost:1935/pipeless/output" --no-cache --untimed --no-demuxer-thread --video-sync=audio --vd-lavc-threads=1

   Feel free to use any other media player like VLC. OR even directly config the deployment to not use the RTMP server and disable the output video or send it to an external endpoint.

Now, simply copy and execute the commands shown in your terminal to send video to your deployment and see the output.

Slow output stream

If the output stream is not fluid you can fix it on two different ways:

Allocate more resources to each pod. Simply edit the worker.resources.requests parameter value.
Increase the number of workers, change the worker.replicaCount parameter value to a higher number. The more workers you deploy the higher framerate you will reach, since the processing will be more distributed.

Conclusions

As you can see, deploying a Pipeless application to the cloud using Kubernetes and Helm is really simple. Once you have a Kuberentes cluster and Helm installed it just takes a single command to deploy your computer vision applications to the cloud.

Also, with Kubernetes you can easily scale workers up and down and it is fault tolerant, which means that if a worker dies for any reason, the frames will be distributed among the remaining ones, thus, your stream will never cut.

Our mission at Pipeless is to support developers building a new generation of vision powered applications. If you would like to support what we do, consider starring and sharing our repository. You can also join the community on GitHub discussions or in the new Discord server.