Honestly, Vertex AI endpoints suck because of this.

1 min readApr 30, 2024

The thing is, when you deploy your code onto an on-demand computation resource, the delay is inevitable because the cloud has to spin up that resource for you. Whatever the cloud and service, when you use on-demand computation, it takes time for the resource to be available.

The solution for no delay is to use a dedicated resource that is already running. It makes complete sens if your API must be running 24/7.

In my case, for a lot of other reasons, I had to switch to Azure. I am now using Azure Machine Learning. They also provide endpoints but they are much flexible. With Azure ML endpoints, you can choose which type of compute you want for you endpoints. You can deploy with serverless (eg. on demand) compute like with Vertex. But you can also deploy on a dedicated Compute Instance that you own 24/7 or on a dedicated cluster. I now deploy on the Kubernetes cluster managed by my DevOps team. I still benefit from the endpoint and deployment abstraction with the python SDK (and all the goodies like traffic splitting, etc...) but the deployment is instantaneous.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Adrien Biarnes

No responses yet