Deployment Configuration for Models

The Model Manager allows you to deploy highly available, scalable models and support more sophisticated workflows. The Deployment section on the Settings tab of a model allows you to configure the two important aspects of a deployment - the compute resources that your model needs to scale, and the number of routes you want to expose given your workflow.

Scaling your Model 

You have two levers to scale your model.

1. Horizontal Scale - You can select the number of model instances that you want running at any given time. A minimum of 2 instances allows you to have a highly available model and is the default selection. We support upto 32 instances at the moment.

2. Vertical Scale - You can choose how big your instance should be - how much memory and CPU will each instance need. These options are set system wide so if you need a different size, please contact your system administrator.

When you change either of these selections, the running model version(s) are restarted with the new settings.  

GPUs are not currently supported.  

Routing your Model

We support two routing modes. 

1. Basic - In this mode, you only have one route exposed and that always points to the latest successfully deployed model version. When you deploy a new one, the old one is shut down and replaced with the new one while maintaining availability. The route has the following signature:

a. Latest - /models/<modelId>/latest/model

2. Advanced - In this mode, you can have two running versions - a promoted one and a latest one. This allows you to have a workflow where your clients always point to the promoted one and you can test with the latest for some time before promoting it without any downtime. The routes have the following signature:

a. Latest - /models/<modelId>/latest/model

b. Promoted or Production - /models/<modelId>/labels/prod/model

Was this article helpful?
0 out of 0 found this helpful