Follow

Compute Environment Management

Each run (of a batch script or an interactive session) runs in an isolated Docker container. When Domino starts your run, it creates a container based on the "Environment" associated with your project. An Environment is a Domino abstraction on top of a Docker image that provides some additional flexibility. This article describes how to work with Environments, and it is oriented at the most common use case related to Environments: installing custom software packages.

If you need to install libraries or packages, for many use cases, you may find it sufficient to follow the language-specific instructions.

However, there are some situations where you may instead wish to modify a custom environment. Here are a few examples:

  • You need to install something other than a Python, R, or Octave package.
  • You use a library that takes a long time to install at runtime, which you’d prefer to cache instead (here's a quick guide to this).
  • You are managing an organization and want a standard default environment for your team across all projects.

Managing Environments

If you reach out to us, we can modify your custom environment for you. However, some users may wish to write their own environments, if only to close the loop and not need to contact us for every change. If this applies to you, read on.

By default, your user account will not have permissions to modify your environment, so first you'll need to contact us and request that we change this.

After that's taken care of, navigate to your project settings, and select "Add or edit environments":

Clicking the button will bring you to a page with a list of existing environments, as well as fields for creating a new one:

Name

Gives your environment a name so you can identify it on the settings page.

Enable VPN Networking

For help setting up VPN networking, please contact us.

Custom Container Override

If we've created a custom Docker image for you, you can specify it here according to the name we provide. If you are using an on-premises or VPC deployment of Domino, you can build and host your own Docker images in a private repository and point to them with this field. For more information on managing your own private images, see this guide.

Pre-setup script, Post-setup script

Here you can input lines of bash code which will be executed before the start of your run. This code can access to your project files, so you can include a script file and call it here.

The pre-setup script is run before the Python packages in your project’s requirements.txt are installed, and the post-setup script is run after. These scripts are executed at runtime and are not cached.

This is one place you can set environment variables. In many cases you may prefer to set environment variables in your project settings. Variables set in that way can be imported to other projects. However, setting them in the pre-setup script can be useful if you are managing an environment used by many projects and an installed package depends on them. To do so, add them to the file /home/ubuntu/.domino-defaults like so:


Additionally, if you have set project or user environment variables, you can utilize them in the pre- or post-setup script if the variable name begins with the prefix "DRT_".

Docker Arguments

Here you can specify arguments that will be passed to the underlying docker run command. Arguments must be separated by newlines (not spaces). In almost all cases, you shouldn't need to modify this. Please contact us if you need further information.

Raw Dockerfile

You may wish to install packages directly to your environment. This can come in handy if your package can’t be installed using the language-specific instructions, or if runtime installation takes a while - an installation to the environment will be cached, so you won’t have to wait for it every time. The Domino platform uses Docker containers to manage isolated environments. If you already have a Docker image you'd like to use (either one that we've created for you, a public one, or a private one you've hosted somewhere), you can specify it in the preceding "Custom Container Override" section. Alternatively, you can use this "Raw Dockerfile" section to define a new image. This Dockerfile contains instructions that will be used to build your environment. A general reference for writing Dockerfiles, as well as a best-practices guide, maybe found on the Docker website:

http://docs.docker.com/engine/reference/builder/
http://docs.docker.com/engine/articles/dockerfile_best-practices/

In most cases, your Dockerfile will include a FROM statement, some RUN statements, and perhaps some ENV statements.

FROM

Every dockerfile requires a FROM statement at the beginning. This specifies another Docker image upon which your custom environment will be built. You can either use our base image (currently quay.io/domino/base:2016-12-07_1239 , or for python3-based environments quay.io/domino/python3:2016-12-07_1239), or one we've provided for you (see the section "custom container override" above). For example:

FROM quay.io/domino/base:2016-12-07_1239

RUN

The RUN commands execute lines of bash, e.g.

RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz
RUN mv spark-1.5.1-bin-hadoop2.6 /opt RUN rm spark-1.5.1-bin-hadoop2.6.tgz

ENV

ENV commands set bash environment variables. They will be accessible from runs that use this environment. For example,

ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6
ENV PYTHONPATH $SPARK_HOME/python/:$PYTHONPATH
ENV PYTHONPATH $SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
ENV PATH $SPARK_HOME/bin:$PATH

Caveats

  • Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as a previous build, it will assume it can use the cached version. A single new command will invalidate the caching of all subsequent commands.
  • Also, there is a limit to the number of layers (that is, commands) a docker image can have. Currently, this limit is 127. Keep in mind that the image upon which you are building may have already used many layers. If you hit the limit, one way to workaround is to combine several commands into one via "&&", e.g.
    RUN \
    wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && \
    tar xvzf spark-1.5.1-bin-hadoop2.6.tgz && \
    mv spark-1.5.1-bin-hadoop2.6 /opt && \
    rm spark-1.5.1-bin-hadoop2.6.tgz
  • If you are installing multiple python packages via pip, in most cases it's better to use a single pip install command. This is to ensure that dependencies and package versions are properly resolved. Otherwise, if you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
    USER root
    RUN pip install luigi nolearn lasagne


Rebuild Environment on Next Run

Checking this box will cause the raw dockerfile to be run and a container built at the start of the next run. It is always checked for the default environment. Docker will use caching wherever possible to speed up the build.

Sharing Environments

Once you’ve created a custom environment, others might want to use it for their own projects. Your new environment can be shared and re-used in any project that requires the specific set of conditions that environment provides. Domino allows environment sharing in the form of global environments (available in VPC or on-premises deployments), or through organizations. Essentially, project owners who are members of an organization can use all of that organization’s compute environments for their own projects.

For example, if a user is a member of a corporate organization, whatever environments that organization’s admins have created will be available to the user for their own projects. Domino does not constrain users to membership in just one organization. If a user is a member of two organizations - a corporate organization with production environments and an R&D organization with more cutting edge environments  - that user has access to the compute environments of both organizations, as well as any global environments and environments the user already owns directly.

What would happen if this user were to be removed from an organization he was a member of, even though several of his projects rely on that organization’s environments? In that case, the now-unavailable environments would be reset to the user’s default environment. Domino sends a notification to any affected users whenever this happens.

Was this article helpful?
1 out of 1 found this helpful

Comments