Adrien Biarnes
1 min readApr 30, 2024

--

Honestly, Vertex AI endpoints suck because of this.

The thing is, when you deploy your code onto an on-demand computation resource, the delay is inevitable because the cloud has to spin up that resource for you. Whatever the cloud and service, when you use on-demand computation, it takes time for the resource to be available.

The solution for no delay is to use a dedicated resource that is already running. It makes complete sens if your API must be running 24/7.

In my case, for a lot of other reasons, I had to switch to Azure. I am now using Azure Machine Learning. They also provide endpoints but they are much flexible. With Azure ML endpoints, you can choose which type of compute you want for you endpoints. You can deploy with serverless (eg. on demand) compute like with Vertex. But you can also deploy on a dedicated Compute Instance that you own 24/7 or on a dedicated cluster. I now deploy on the Kubernetes cluster managed by my DevOps team. I still benefit from the endpoint and deployment abstraction with the python SDK (and all the goodies like traffic splitting, etc...) but the deployment is instantaneous.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Adrien Biarnes
Adrien Biarnes

No responses yet

Write a response