Domino model APIs are scalable, high-availability REST services. The Deployment tab of the model settings page allows you to configure three important things for your model:
- The compute resources available to your model hosts
- The number of model hosts serving your model
- The number of routes -- or versions -- you want to expose
Scaling your model
There are two dimensions on which to scale your model.
- Horizontal scale
You can select the number of model hosts that you want running at any given time. Domino will automatically load-balance requests to the model endpoint between these hosts. A minimum of 2 instances allows you to have a high-availability model and is the default selection. Domino supports up to 32 instances per model.
- Vertical scale
You can choose a hardware tier that will determine the amount of RAM and CPU resources available to each model host.
When you change either of these selections, your model will be restarted with the new settings.
Routing your model
Domino supports two routing modes.
- Basic mode
In this mode, you only have one route exposed that always points to the latest successfully deployed model version. When you deploy a new one, the old one is shut down and replaced with the new one while maintaining availability. The route has the following signature:
- Advanced mode
In this mode, you can have two running versions - a promoted version and a latest version. This allows you to have a workflow where your clients always point to the promoted version and you can test with the latest. When the latest version is ready for production, you can seamlessly switch it to be the promoted version with no downtime. The routes have the following signature: