Follow

Compute Environment Management

This article is about our technology preview of V2 Environment Management. Click here you're using V1 Environments or are on a version of Domino before 2.1

 

Overview

Compute environment management is the practice of creating new Domino compute environments and/or editing existing compute environments to meet your specific language and package needs. This work is typically done by an administrator or advanced Domino user. 

Here are some examples of where you will want to create or modify an environment: 

  • You need to install a Python, R, Octave, or some other package.
  • You use a library that takes a long time to install, and you’d prefer to cache that package into the environment so that it's always immediately available (here's a quick guide to this).
  • You are managing an organization and want to create a standard default environment for your team across all projects.

Domino leverages Docker for compute environments. An Environment is a Domino abstraction on top of a Docker image that provides additional flexibility and versioning. When Domino starts your run (of a batch script or a workspace session), it creates a Docker container based on the "Environment" associated with your project. Each run runs in an isolated Docker container.  

Managing Environments

The environment your project will use is set from the project's Settings page.

The dropdown allows you to select the environment that runs inside this project will use.

Note: if you do not see the Manage Environments buttons highlighted in the screenshot above, then this feature is not enabled for you. Please speak with your system administrator or contact us at support@dominodatalab.com to enable the ability to manage your environments. 

Clicking the "Manage Environments" button will bring you to a page with a list of existing environments you have access to with an option to create a new environment.

You'll see the environments you have access to including your Domino deployment's global default as well as environments in use by projects you are a collaborator of and environments owned by organizations you are a member of.

 

You can create a new environment using the "Create Environment" link in the top right corner:'

You'll be asked to name your new environment and define its visibility. Administrators will see a third option to have the new environment be available globally (to all users of the deployment):

After creating your environment, you will be taken to the environment detail page which has three buttons (Edit Definition, Duplicate Environment, and Archive Environment) and several tabs - Overview, Revisions, Projects, Data Sets and Models (if enabled).

Environment Actions

  • Edit Definition will take you to a page where you can edit all of your environment's attributes (see below)
  • Duplicate Environment will clone your environment
  • Archive Environment will hide your environment and not allow users to choose it from the project settings dropdown. Projects already using this environment will continue to work until a new project environment is set. 

Overview Tab

The overview tab shows all metadata about your environment including the following attributes. Click "Edit Environment" in the top right to go into edit mode to make changes to your environment. After each save, your environment's revision number will be incremented by one and your Domino deployment will rebuild the environment and push it to the local docker registry.

Please see the "Compute Environment Glossary" below for a description of each field.

Revisions

The revisions tab shows a list of all revisions of your compute environment along with each revision's build status, timestamp, and docker image URI (so you can use it as a base image, for example). You can click the gear icon to reveal additional options including the ability to view build logs, cancel build, or set a revision as Active.

You'll notice in the screenshot above that revision ID #1 has finished building and is the "Active" revision for this environment whereas the latest, ID #2, is in the process of building and cannot yet be used by a project.

 

Projects, Data Sets, Models

Once the environment has been assigned to a project, data set or model, you will be able to see a list of those entities on their tab. This is useful for seeing who you need to contact if you update or want to archive an environment, for example. 

  

Compute Environment Attributes

  • Base Environment: Gives you a choice between basing your compute environment on your deployment default or on a custom Dockerfile URI. This defines the "FROM" line in the Dockerfile Domino constructs for you.
  • Dockerfile Instructions: Enter your Dockerfile layers here. Docker's official site has a handy guide here. You can also read our primer on Dockerfiles below.
  • Pluggable Notebooks/Workspace Sessions: Define which interactive tools should be available in a project using this environment. See this for more details. 
  • Scripts: Here you can input lines of bash code which will be executed at the specified step in your experiment's lifecycle. These commands are run as root and are executed at runtime (they are not cached).
    • Pre-setup scripts are run before the Python packages in your project’s requirements.txt are installed
    • Post-setup scripts are run after the requirements.txt installation process
    • Pre-run scripts are run right after post-setup scripts
    • Post-run scripts are run at the beginning of the "Stopping" run state
  • Enable VPN Networking: Admins can enable VPN Networking. For help setting it up, please contact us.
  • Inherited Docker Arguments: Here, you can see which Docker Arguments have been inherited from the base compute environment (coming soon)
  • Docker Arguments: Here, admins can specify arguments that will be passed to the underlying docker run command. Arguments must be separated by newlines (not spaces). In almost all cases, you shouldn't need to modify this.
  • Username: Admins can specify a non-default username for your environment here
  • Environment variables: You can set environment variables at the Compute Environment level

Raw Dockerfiles

You may wish to install packages directly to your environment. This can come in handy if your package can’t be installed using the language-specific instructions, or if runtime installation takes a while - an installation to the environment will be cached, so you won’t have to wait for it every time. The Domino platform uses Docker containers to manage isolated environments. If you already have a Docker image you'd like to use (either one that we've created for you, a public one, or a private one you've hosted somewhere), you can specify it in the preceding "Base Environment" field. If you don't set this, we will use the default environment as your base image.


http://docs.docker.com/engine/reference/builder/
http://docs.docker.com/engine/articles/dockerfile_best-practices/

Note that Domino takes care of the FROM line for you. Do not start your "Dockerfile Instructions" with a FROM line.

 

The two most common Dockerfile instructions you'll use are "RUN" and "ENV":

RUN commands execute lines of bash, for example:

RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz
RUN mv spark-1.5.1-bin-hadoop2.6 /opt RUN rm spark-1.5.1-bin-hadoop2.6.tgz

ENV commands set bash environment variables. They will be accessible from runs that use this environment. For example:

ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6

If you set environment variables as part of the "Environment variables" section of your environment definition, you only need to specify the name of the environment variable, not the value. For example:

ENV SPARK_HOME

Caveats

  • Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as a previous build, it will assume it can use the cached version. A single new command will invalidate the caching of all subsequent commands.
  • Also, there is a limit to the number of layers (that is, commands) a docker image can have. Currently, this limit is 127. Keep in mind that the image upon which you are building may have already used many layers. If you hit the limit, one way to workaround is to combine several commands into one via "&&", e.g.
    RUN \
    wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && \
    tar xvzf spark-1.5.1-bin-hadoop2.6.tgz && \
    mv spark-1.5.1-bin-hadoop2.6 /opt && \
    rm spark-1.5.1-bin-hadoop2.6.tgz
  • If you are installing multiple python packages via pip, it's almost always best to use a single pip install command. This is to ensure that dependencies and package versions are properly resolved. Otherwise, if you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
    RUN pip install luigi nolearn lasagne

 

Was this article helpful?
0 out of 0 found this helpful