Data in Domino

Overview


This article describes how Domino stores and handles data that users upload, import, or create in Domino. There are two systems that store data in Domino:

  • Domino project files
  • Domino Datasets

Additionally, Domino supports connecting to many external data stores. Users can import data from external stores into Domino, export data from Domino to external stores, or run code in Domino that reads and writes from external stores without saving data in Domino itself.

 

 

Contents


 

 

 

About Domino project files


 

How is the data in project files stored?


Work in Domino happens in projects. Every Domino project has a corresponding collection of project files. While at rest, project files are stored in a durable object storage system, referred to as the Domino Blob Store. This can be a cloud service like Amazon S3, or it can be an on-premises Network Attached Storage (NAS) system.

When a user starts a Run in Domino, the files from his or her project are fetched from the Blob Store and loaded into the Run in the working directory of the Domino service filesystem. When the Run finishes, or the user initiates a manual sync in an interactive Workspace session, any changes to the contents of the working directory are written back to Domino as a new revision of the project files. Domino’s versioning system tracks file-level changes and can provide rich file difference information between revisions.

Domino also has several features that provide users with easy paths to quickly initiating a file sync. The following events in Domino can trigger a file sync, and the subsequent creation of a new revision of a project’s files.

  • User uploads files from the Domino web application upload interface
  • User authors or edits a file in the Domino web application file editor
  • User syncs their local files to Domino from the Domino Command Line Interface
  • User uploads files to Domino via the Domino API
  • User executes code in a Domino Job that writes files to the working directory
  • User writes files to the working directory during an interactive Workspace session, and then initiates a manual sync or chooses to commit those files when the session finishes

All revisions of project files that Domino creates are kept forever, since project files are a component in the Domino Reproducibility Engine. It is always possible to return to and work with past revisions of project files.

Note

While users are generally unable to permanently delete data from Domino project files, administrators do have the capability to delete specific files by directly editing the contents of the blob store. This is an invasive process and not recommended for day-to-day activity.

 

Who can access the data in project files?


Users can read and write files to the projects they create, on which they automatically are granted an Owner role. Owners can add collaborators to their projects with the following additional roles and associated files permissions.

  • Contributor

    Can read and write project files.

  • Results Consumer

    Can read project files.

  • Launcher User

    Cannot access project files.

  • Project Importer

    Can access files made available for export.

The permissions available to each role are described in more detail in Sharing and collaboration.

Users can also inherit roles from membership in Domino Organizations. Learn more in the Organizations overview.

Domino users with administrative roles are granted additional access to project files across the Domino deployment they administer. Learn more in Admin roles.

 

 

 

About Domino Datasets


 

How is the data in Domino Datasets stored?


When users have large quantities of data, including collections of many files and large individual files, Domino recommends storing the data in a Domino Dataset. Datasets are collections of Snapshots, where each Snapshot is an immutable image of a filesystem directory from the time when the Snapshot was created.

These directories are stored in a network filesystem like Amazon EFS or a local NFS, and can be attached to Domino Runs for read-only use without transferring their contents into the Domino service filesystem. This allows users to quickly start working on big data in Domino.

Each Snapshot of a Domino Dataset is an independent state, and its membership in a Dataset is an organizational convenience for working on, sharing, and permissioning related data. Domino supports running scheduled Jobs that create Snapshots, enabling users to write or import data into a Dataset as part of an ongoing pipeline.

Unlike project files, Dataset Snapshots can be permanently deleted by Domino system administrators. Snapshot deletion is designed as a two-step process to avoid data loss, where users mark Snapshots they believe can be deleted, and admins then confirm the deletion if appropriate. This permanent deletion capability makes Datasets the right choice for storing data in Domino that has regulatory requirements for expiration.

 

Who can access the data in Domino Datasets?


Datasets in Domino belong to projects, and access is afforded accordingly to users who have been granted roles on the containing project. Owners can mount Snapshots from Datasets in the project for read access, they can write new Snapshots, and they can add collaborators with the following roles.

  • Contributor

    Can mount Datasets for read access and write new Snapshots.

  • Results Consumer

    Cannot read from Datasets or write new Snapshots.

  • Launcher User

    Cannot read from Datasets or write new Snapshots.

  • Project Importer

    Can mount Datasets for read access.


The permissions available to each role are described in more detail in Sharing and collaboration.

Users can also inherit roles from membership in Domino Organizations. Learn more in the Organizations overview.

Domino users with administrative roles are granted additional access to Datasets across the Domino deployment they administer. Learn more in Admin roles.

 

 

 

Integrating Domino with other data stores and databases


Domino can be configured to connect to external data stores and databases. This process involves loading the required client software and drivers for the external service into a Domino environment, and loading any credentials or connection details into Domino environment variables. Users can then interact with the external service in their Runs.

Users can import data from the external service into their project files by writing the data to the working directory of the Domino service filesystem, and they can write data from the external service to Dataset Snapshots. Alternatively, it is possible to construct workflows in Domino that save no data to Domino itself, but instead pull data from an external service, do work on the data, then push it to an external service.

Learn more in the Data sources overview and read our detailed Data source connection guides.

 

 

 

Tracking and auditing data interactions in Domino


Domino system administrators can set up audit logs for user activity in the platform. These logs record events whenever users:

  • Create files
  • Edit files
  • Upload files
  • View files
  • Sync file changes from a Run
  • Mount Dataset Snapshots
  • Write Dataset Snapshots

This list is not exhaustive, and will expand as Domino adds new features and capabilities.

Domino administrators can contact support@dominodatalab.com for assistance enabling, accessing, and processing these logs.

Was this article helpful?
0 out of 0 found this helpful