Creating a new executor AMI

WARNING

This is an advanced prodecure. If done improperly, it could leave your deployment in an inoperable state. This document is intended for experienced administrators only. Please reach out to support@dominodatalab.com if you have questions.

This procedure only covers deployments using elastic compute resources in Amazon Web Services (AWS).

Overview

Domino’s cluster of executors dynamically scales up and down based on the current demand for compute resources, driven by the number of active runs. When Domino needs to create a new machine to add to the cluster, Domino uses a hardware tier definition connected to an AWS Amazon Machine Image (AMI) template to define the starting state of the machine.

Each run or workspace that a user starts will run in a Docker container with an associated Docker image on a machine in the executor cluster. Domino pulls the required Docker image from an internal or external Docker registry.

In order to minimize the time it takes to pull the Docker image onto a new machine, we suggest that you add your base image and most common environments images to your executor template, and create a new AMI for future executors. This way, the Docker layers do not need to be downloaded from the registry onto each new executor, and instead are available immediately when the machine is spun up. See Run States to learn more about the life cycle of a run.

This article describes the process and best practices for creating a new AMI. The process involves use of the executor template, which is an idle executor machine that is not used for runs, but exists only to be a fresh template. You will need access to the AWS console for the account where your deployment is running to find this machine and perform the necessary steps.

Procedure

Log in to the Executor Template

  1. Log in to the AWS console where your Domino deployment is installed, open the EC2 service, click Instances in the sidebar, and find the executor template instance. This instance should be tagged with a name that includes the string executor-template. Start the template machine if it is stopped.

  2. Using its IP or AWS DNS address, SSH into the executor template machine using your deployment’s private key. Example:

    ssh ec2-xx-xx-xxx-xxx.us-west-2.compute.amazonaws.com -i ~/my-private-key

    This key should be supplied by Domino engineers following your Domino installation. If you do not have the key, reach out to support@dominodatalab.com.

Pull your desired Docker images

  1. Run docker images as the root user to see what images are cached.
  2. Run docker pull followed by an image URL to cache the specified image on the executor. If an image was built within Domino, you can find the URL on the Revisions tab for the environment in the Domino UI. Example:

    docker pull quay.io/domino/base:DED_py3.6_R3.4_23052018

Snap the AMI

  1. Run salt-call state.highstate. This applies all necessary software and system updates.
  2. Select the executor template machine in the AWS EC2 console, then click Actions -> Images -> Create Image. Naming the new AMI domino-<deployment-name>-executor-YYYYMMDD-HHMM. Use the default storage volumes, but be sure to check Delete on Termination for all volumes.
  3. From the sidebar, click AMIs. Wait for the new AMI to have a status of available. You may need to refresh to see the table update. Once it’s ready, record the AMI ID. You will need the ID to set up the AMI for use in Domino.

Test the new AMI

  1. In the Domino application, open an existing unused hardware tier, or create a new hardware tier for testing.
  2. Edit the hardware tier, and set the AMI ID to the one you recorded in the previous section.
  3. Set up a Domino project to use the hardware tier you just edited, and use an environment that you cached an image for earlier. When you start a workspace in this project, you should see it progress through a queued state as it starts up the new machine, but spend zero (or minimal) time in a pulling state.

Apply the new AMI to other hardware tiers

NOTE

Be sure to alert users of incoming changes to their hardware tiers, or conduct these steps during a maintenance window.

  1. Make note of the current AMI IDs used by existing hardware tiers. You can use these notes to revert later if needed.

  2. Before updating all hardware tiers, make sure you don’t have any hardware tiers that use special AMIs. For instance, some GPU workloads may use a special hardware tier with a customized AMI running Ubuntu 16.04. Do not change such tiers to use the new AMI.

  3. You can update hardware tiers individually to use the new AMI by editing them in the Domino application and entering the new ID. Alternatively, you can update all hardware tiers to use the new AMI for all new machines by connecting to the Domino central server via SSH, and running the following MongoDB command:

    db.executor_group_configuration.update({},{$set:an{"executorImage":"NEW_AMI_ID"}},{multi:true})

    Currently running executors will not automatically switch to the new AMI. You can place such machines in Maintenance Mode, preventing new runs from starting on that machine, and manually terminate the machine when live runs have concluded. They will be replaced executors created with the new AMI when compute demand triggers a new machine spin up.

FAQ

How often should I snap a new AMI?

We recommend that administrators review their AMIs and compute environments quarterly, or if you’ve noticed that users have cutom compute environments that take a long time to pull when starting runs. You can refactor those environments by removing common custom instructions and adding it to a base image. You can then add this new base image to your AMI, and those common instructions will be cached.

Which Docker images should I add to the AMI? Docker operates in layers. For example, consider two image with layers ABC, and ABCDE respectively. These images share their first three layers. Each layer being the state generated by a line in the Dockerfile. If an image with layers ABC is already cached on a machine, then only layers D and E need to be downloaded when you want to use an image with layers ABCDE. We recommend that you build most of your environments on top of a small number (<5) of base images, and that you add those images to your AMI. There’s no hard limit to the number of images you can cache, but addming more images requires more disk space on executors.

Should I remove old images from the AMI? This is not required. You may want to keep them to maintain backwards compatibility, or you may chose this as an opportunity to encourage users to start working from the latest image. The only consequence of removing an older image from the AMI is longer pulling times for users who start runs with environments that depend on that image.

Was this article helpful?
0 out of 0 found this helpful