Connecting to S3 from Domino

Overview


This article describes how to connect to Amazon Simple Storage Service (S3) from Domino.

S3 is a cloud object store available as a service from AWS.

 

 

Options for connecting to S3 from Domino


This article will discuss four ways of connecting to S3:

 

 

Getting a file from an S3-hosted public path


If you have files in S3 that are set to allow public read access, you can fetch those files with Wget from the OS shell of a Domino executor, the same way you would for any other resource on the public Internet. The request for those files will look similar to this:

wget https://s3-<region>.amazonaws.com/<bucket-name>/<filename>

This method is very simple, but doesn't allow for any authentication or authorization, and should not be used with sensitive data.

 

 

AWS CLI


A more secure method of reading S3 from the OS shell of a Domino executor is the AWS CLI. Making the AWS CLI work from your executor is a two-step process. You need to install it in your environment, and provide it with your credentials.

 

Environment setup

AWS CLI is available as a Python package from pip. The Dockerfile instruction below is what you'll need to install the CLI and automatically add it to your system PATH.

This instruction assumes you already have pip installed.

RUN pip install awscli --upgrade

For a basic introduction to modifying Domino environments, watch this tutorial video.

 

Credential setup

In order to connect to the S3 buckets your AWS account has access to, you'll need to provide your AWS Access Key and AWS Secret Key to the AWS CLI. By default AWS utilities will look for these in your environment variables.

You should set the following as Domino environment variables on your user account:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Read Environment variables for secure credential storage to learn more about Domino environment variables.

 

Usage

Once your Domino environment and credentials are set up correctly, you can fetch the contents of an S3 bucket to your current directory by running:

aws s3 sync s3://<bucket-name> .

Read the official AWS CLI documentation on S3 for more commands and options.

 

 

Python and boto3


The best available library for interacting with AWS services from Python is boto3, which has been officially supported by Amazon since 2012.

 

Environment setup

If you're using one of the Domino standard environments, boto3 will already be installed. If you want to add boto3 to an environment, use the following Dockerfile instruction.

This instruction assumes you already have pip installed.

RUN pip install boto3

For a basic introduction to modifying Domino environments, watch this tutorial video.

 

Credential setup

In order for boto3 to connect to the S3 buckets your AWS account has access to, you'll need to provide it with your AWS Access Key and AWS Secret Key. Just like the AWS CLI, boto3 will look for these in your environment variables.

You should set the following as Domino environment variables on your user account:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Read Environment variables for secure credential storage to learn more about Domino environment variables. 

 

Usage

There are many methods for interacting with S3 from boto3 detailed in the official documentation. Below is a simple example for downloading a file where:

  • you have set up the correct environment variables with credentials for your AWS account
  • your account has access to an S3 bucket named my_bucket
  • the bucket contains an object named some_data.csv
import boto3
import io
import pandas as pd

# create new S3 client
client = boto3.client('s3')

# download some_data.csv from my_bucket and write to ./some_data.csv locally
file = client.download_file('my_bucket', 'some_data.csv', './some_data.csv')

Note that this code does not provide credentials as arguments to the client constructor, since it assumes credentials will be in the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

After running the above code, you would expect a local copy of some_data.csv to now exist in the same directory as your Python script or notebook. You could follow this up by loading the data into a pandas dataframe.

df = pd.read_csv('some_data.csv')

Check out part 1 of the First steps in Domino tutorial for a more detailed example of working with CSV data in Python.

 

 

 

R and aws.s3


The cloudyr project offers a package called aws.s3 for interacting with S3 from R.

 

Environment setup

If you're using one of the Domino standard environments, aws.s3 will already be installed. If you want to add aws.s3 to an environment, use the following Dockerfile instructions.

RUN R -e 'install.packages(c("httr","xml2"), repos="https://cran.r-project.org")'
RUN R -e 'install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))'

For a basic introduction to modifying Domino environments, watch this tutorial video.

 

Credential setup

In order for aws.s3 to connect to the S3 buckets your AWS account has access to, you'll need to provide it with your AWS Access Key and AWS Secret Key. By default, aws.s3 will look for these in your environment variables.

You should set the following as Domino environment variables on your user account:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Read Environment variables for secure credential storage to learn more about Domino environment variables. 

 

Usage

You can find basic instructions on using aws.s3 from the package README. Below is a simple example for downloading a file where:

  • you have set up the correct environment variables with credentials for your AWS account
  • your account has access to an S3 bucket named my_bucket
  • the bucket contains an object named some_data.csv
# load the package
library("aws.s3")

# download some_data.csv from my_bucket and write to ./some_data.csv locally
save_object("some_data.csv", file = "./some_data.csv", bucket = "my_bucket")

After running the above code, you would expect a local copy of some_data.csv to now exist in the same directory as your R script or notebook. You can then read from that local file to work with the data it contains.

myData <- read.csv(file="./some_data.csv", header=TRUE, sep=",")
View(myData)

 

Was this article helpful?
0 out of 0 found this helpful