Follow

How to avoid re-installing packages in each run

If your code depends on a library that takes a long time to install, you can mitigate the wait time by caching the installation via a custom environment. A full guide to environment management is available here, but since this is a common question a dedicated tutorial is provided below.

To quickly create an environment that caches your installation:

  1. From the project settings page, locate the section that says "Compute Environments" and click the "Manage Environments" button. If you cannot see this button, please reach out to support@dominodatalab.com or via the in-app Intercom messenger to request access.
  2. Create a new environment and add the following to the Dockerfile section:

    FROM <base image>

    # To install a Python library:
    USER root
    RUN pip install <Python library>

    # To install an R package:
    RUN R -e 'install.packages("<R library>");'

    # You can also install system dependencies with bash commands:
    USER root
    RUN apt-get install -y <system library>

    Substitute in the appropriate base image, libraries, and any other installation commands you need to run. If you are using the public cloud deployment of Domino, you can use quay.io/domino/base:2016-12-07_1239 for the base image (which is the most up-to-date version at the time of this writing).

  3.  After saving your environment, go back to your project settings page and select it from the "Compute Environments" dropdown. Start a run to verify that the installation works.

Notes:

  • Code in the Dockerfile section will execute once per machine and then will be cached for subsequent runs on that machine.
  • For the public cloud as well as VPC deployments of Domino, machine assignment is random and the cluster scales dynamically. So you will occasionally observe the installation running instead of using the cache. This becomes less likely the more runs you perform and machines you cumulatively use.
  • Switching to a hardware tier you've never used with this environment will send your run to a new machine, so the installation will need to re-run.
  • The FROM statement specifies the name of the base image upon which your environment is built. If you already have a custom environment with a Dockerfile definition, you can just append the new RUN statement(s) to the end of this environment.
Was this article helpful?
0 out of 0 found this helpful

Comments