3.2.2.1. ADA interface

Our ADA (Advanced dCache API) interface is based on the dCache API and the webdav protocol to access and process your data on dCache from any platform and with various authentication methods. In the example below, we will use macaroons as authentication method.

macaroons are a token based authentication method supported by dCache. Macaroons can be used to give access to dCache data in a very granular way. This enables data managers to autonomously share their data in dCache without having to reach out to SURF to request access.

In addition to ADA, rclone is used for data transfers from/to dCache. rclone is a webdav client that supports by default 4 parallel streams of data, and is installed on Spider.

A quick start up guide for ADA is captured in the video below:

3.2.2.1.1. Browser view

dCache storage can be viewed both through the ADA tools or through the browser using the web client. The browser view is available only for Data managers, and is just an additional way to explore the storage space.

As a Data manager you have direct credentials on dCache and it is possible to access the browser view using your SURFcua credentials in the following link:

https://webdav-secure.grid.surfsara.nl/pnfs/grid.sara.nl/data/[PROJECT]/

Note

You may be asked for a browser certificate, just select cancel and you will be asked for your credentials. These are the same credentials used for logging in to the SURF CUA portal in Section 2.1)

3.2.2.1.2. Using ADA

ADA is a wrapper of tools created by SURF to simplify your interactions with dCache. ADA wraps operations that can be performed directly on the dCache REST API, such as listing or deleting files and directories. ADA wraps all of this functionality into one clean package saving you the hassle of having to download and troubleshoot multiple packages and dependencies. ADA does not support uploading and downloading data, for this you need to use rclone. To simplify matters, ADA and rclone can use the same config file. Both ADA and rclone are installed on Spider.

This section provides examples and the steps to start using ADA to interact with your dCache storage.

3.2.2.1.2.1. Create a macaroon

  • Requirements: credentials to dCache

    • username/pwd or

    • x509 proxy

  • Spider role: Data manager

  • Action: create a macaroon

  • Output: rclone config file [PROJECT_tokenfile].conf. You can share this file with any member of the project in next step.

  • Description: the DM creates a macaroon for a shared directory (including the sub-directories & files). In the next step he will share the macaroon with the project team in a non-public space, either user’s home directories, or the ‘shared’ or ‘data’ project space directories.

  • Example:

get-macaroon \
    --url https://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/[PROJECT] \
    --duration P7D \
    --chroot \
    --user [USERNAME] \
    --permissions DOWNLOAD,UPLOAD,DELETE,MANAGE,LIST,READ_METADATA,UPDATE_METADATA \
    --ip [IP RANGE] \
    --output rclone [PROJECT_tokenfile]

You will be asked for your CUA password after submitting this command. This example creates a macaroon that is valid for 7 days for the given url. The argument chroot ensures that the url is taken as the root directory when the macaroon is used later.

The following permissions can be given comma-separated upon creation of the macaroon:

Permission

Function

DOWNLOAD

Read a file

UPLOAD

Write a file

DELETE

Delete a file or directory

MANAGE

Rename or move a file or directory

LIST

List objects in a directory

READ_METADATA

Read file status

UPDATE_METADATA

Stage/unstage a file, change QoS

You can explore the other commandline arguments with get-macaroon --help.

3.2.2.1.2.2. Share macaroons

The config file generated in the step above can be shared with project members and collaborators so they can access the data. The holder of this config file can operate on the dCache project data directly and thus, the config file should be shared with the project team in a non-public space, for example user’s home directories, or the ‘Shared’ or ‘Data’ project space directories on Spider.

  • Requirements: the rclone config file [PROJECT_tokenfile].conf

  • Spider role: Data manager

  • Actions: share [PROJECT_tokenfile].conf in a project space that can be read by other project users

  • Output: the config file tokenfile.conf is stored in a shared space

  • Example:

cp [PROJECT_tokenfile].conf /project/[PROJECT]/Data

3.2.2.1.2.3. Inspect the macaroon

  • Requirements: the rclone config file [PROJECT_tokenfile].conf

  • Spider role: normal user

  • Actions: view macaroon

  • Output: the list of activities and directories that you can use on dCache

  • Example:

# Your macaroon is the value of 'bearer_token'
$ cat [PROJECT_tokenfile].conf
[tokenfile]
type = webdav
bearer_token = MDAxY2xvY2F0aWXXXXXXXXXXXXXXXX
url = https://webdav.grid.surfsara.nl:2880/
vendor = other
user =
password =

#View the macaroon details
$ view-macaroon [PROJECT_tokenfile].conf
location Optional.empty
identifier NDFXzXXX
cid iid:03FXXX//
cid id:39147;35932,30013;[Data Manager Name]
cid before:2020-02-05T11:01:11.577Z
cid home:/[Project folder]
cid root:/[Project folder]
cid activity:DOWNLOAD,UPLOAD,MANAGE,LIST
signature fefef25a4973e59b10ad464054dXXXXXXX

3.2.2.1.2.4. Use the macaroon

This section describes how to work with your files.

  • Requirements: the rclone config file [PROJECT_tokenfile].conf. For ADA this is referred to as tokenfile.

  • Spider role: normal user

Tip

You can use an environment variable to set the tokenfile, rather than having to pass it on the command line every time. Enter the command:$export ada_tokenfile=/path-to-mytoken/[PROJECT_tokenfile].conf and then you can omit the option ‘–tokenfile’ from all of the ADA commands.

Tip

You can get extra information about the submitted command and the REST API call details by using the –debug option in your ADA command.

3.2.2.1.2.4.1. Check your access to the system

--whoami

  • Action: request authentication details

  • Output: information about the token owner and permissions

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --whoami
{
"status": "AUTHENTICATED",
"uid": 515XX,
"gids": [
    511XX
],
"username": "[Data Manager name]",
"rootDirectory": "/pnfs/grid.sara.nl/data/[Project]/disk",
"homeDirectory": "/"
}

3.2.2.1.2.4.2. Listing files

--list <directory>

--longlist <file|directory>

--longlist --from-file <file-list>

  • Action: list files or directories

  • Output: list or long-list of the files from the directory that the macaroon allows permission for

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --longlist /[DIRECTORY]

Note that because we added the commandline argument chroot when creating the macaroon, we do not need to specify the full url to the directory on dCache.

3.2.2.1.2.4.3. Get file or directory details

--stat <file|directory>

  • Action: show all details of a file or directory

  • Output: metadata information

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --stat /[FILE or DIRECTORY]

3.2.2.1.2.4.4. Create a directory on dCache

--mkdir <directory>

  • Action: create directories

  • Output: new directory created

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --mkdir /[DIRECTORY]

3.2.2.1.2.4.5. Moving or renaming files

--mv <file|directory> <destination>

  • Action: Move file or directory. This can be used as an option also to rename a directory if the move is done in the same directory. Specify the path and name to the source and target directory

  • Output: File or Directory moved to a different dCache location or renamed

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --mv /[SOURCE] /[DESTINATION]

3.2.2.1.2.4.6. Recursively remove folders

--delete <file|directory> [--recursive [--force]]

  • Action: delete files or directories

  • Output: file or Directory is deleted

  • Recursive deletion: to recursively delete a directory and ALL of its contents, add --recursive. You will need to confirm deletion of each subdir, unless you add --force.

  • Alternative: rclone purge

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --delete /[FILE or DIRECTORY]
ada --tokenfile [PROJECT_tokenfile].conf --delete /[FILE or DIRECTORY] --recursive
ada --tokenfile [PROJECT_tokenfile].conf --delete /[DIRECTORY] --recursive --force
# alternative
$ rclone --config=[PROJECT_tokenfile].conf purge [PROJECT_tokenfile]:[FILE or DIRECTORY]

3.2.2.1.2.4.7. Checksum

--checksum <file>

--checksum <directory>

--checksum --from-file <file-list>

  • Action: get the checksum of a files or files inside a directory or list of files

  • Output: show MD5/Adler32 checksums for files

  • Example:

ada --tokenfile [PROJECT_tokenfile].conf --checksum /[FILE or DIRECTORY]
# create a filelist and get checksums for files in it
ada --tokenfile [PROJECT_tokenfile].conf --list /disk/mydir > files-to-checksum
sed -i -e 's/^/\/disk\/mydir\//' files-to-checksum
ada --tokenfile [PROJECT_tokenfile].conf --checksum --from-file files-to-checksum
#/disk/file1  ADLER32=80690001
#/disk/file2  ADLER32=80690001
#/disk/file3  ADLER32=80690001

3.2.2.1.2.4.8. View your usage

  • Action: get your storage usage with rclone

  • Example:

rclone --config=[PROJECT_tokenfile].conf size [PROJECT_tokenfile]:/

3.2.2.1.2.4.9. Staging

The dCache storage at SURF consists of magnetic tape storage and hard disk storage. If your quota allocation includes tape storage, then the data stored on magnetic tape has to be copied to a hard drive before it can be used. This action is called ‘staging files’ or ‘bringing a file online’. ADA supports bulk staging which significantly improves performance compared to staging files one by one.

The files remain online as long as there is free space on the disk pools. When a pool group is full (maximum of assigned quota on staging area) and free space is needed, dCache will purge the least recently used cached files. The tape replica will remain on tape.

The amount of time that a file is requested to stay on disk is called pin lifetime. The file will not be purged until the pin lifetime has expired. You can specify the pin lifetime with the argument –lifetime in your staging commands. The pin lifetime can be set to SECONDS, MINUTES, HOURS or DAYS. If –lifetime is not given, default is 7 DAYS.

For each staging request a reference is added in a log file in your home directory. The log file can be found in ` ~/.ada/requests.log` and it saves the request IDs, target paths and stage request timestamps.

Your macaroon needs to be created with UPDATE_METADATA permissions to allow for staging operations.

--stage <file>

--stage <directory>

--stage --from-file <file-list>

  • Action: stage a file from tape or files in directory or a list of files (restore, bring it online)

  • Output: the file or list of files comes online on disk

  • Example:

#list files to get the status
ada --tokenfile [PROJECT_tokenfile].conf --longlist /[PROJECT_tape_dir]
#file1  1186443  2020-02-13 16:27 UTC  tape  NEARLINE
#file2  1635     2018-10-24 15:34 UTC  tape  NEARLINE

#stage a single file
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/file1

#stage a single file with pin lifetime two weeks
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/file1 --lifetime 14D

#stage a directory (optionally recursively with --recursive)
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/dirname/

#stage a list of files
ada --tokenfile [PROJECT_tokenfile].conf --stage --from-file files-to-stage

3.2.2.1.2.4.10. Unstaging

Your macaroon needs to be created with UPDATE_METADATA permissions to allow for unstaging operations.

For each unstaging request a reference is added in a log file in your home directory. The log file can be found in ` ~/.ada/requests.log` and it saves the request IDs, target paths and unstage request timestamps.

--unstage <file>

--unstage <directory>

--unstage --from-file <file-list>

  • Action: unstage/release a file from tape or files in directory or a list of files

  • Output: the file or list of files is unstaged and may be removed for the disk any time so dCache may purge its online replica.

#unstage a single file
ada --tokenfile [PROJECT_tokenfile].conf --unstage /[PROJECT_tape_dir]/file1

# unstage dir (optionally recursively with --recursive)
ada --tokenfile [PROJECT_tokenfile].conf --unstage /[PROJECT_tape_dir]/dirname/

#unstage a list of files
ada --tokenfile [PROJECT_tokenfile].conf --unstage --from-file files-to-unstage

3.2.2.1.2.5. Transfer Data

In order to transfer files from/to dCache we use the same [PROJECT_tokenfile].conf and the rclone client to trigger webdav transfers as shown below.

3.2.2.1.2.5.1. Copy data from dCache

rclone --config=[PROJECT_tokenfile].conf copy [PROJECT_tokenfile]:/[SOURCE] ./[DESTINATION] -P

Example, copy an existing test folder to Spider:

rclone --config=[PROJECT_tokenfile].conf copy [PROJECT_tokenfile]:/tests/ ./tests/ -P

3.2.2.1.2.5.2. Write data to dCache

rclone --config=[PROJECT_tokenfile].conf copy ./[SOURCE]/ [PROJECT_tokenfile]:[DESTINATION] -P

Notes on data transfers:

  • The rclone copy mode will just copy new/changed files. The rclone sync (one way) mode will create a directory identical to the source so be careful because this can cause data loss. We suggest you to test first with the –dry-run flag to see exactly what would be copied and deleted.

  • You can increase the number of parallel transfers with the --transfers [Number] option.

  • When copying a small number of files into a large destination you can add the --no-traverse option in the rclone copy command for controlling whether rclone lists the destination directory or not. This can speed transfers up greatly.

  • If you are certain that none of the destination files exists you can add the --no-check-dest option in the rclone copy command to speed up the transfers.

  • For very large files it is important to set the –timeout option high enough. As a rule of thumb, set it to 10 minutes for every GB of the biggest file in a collection. This may look ridiculously large, but it provides a safe margin to avoid problems with timeout issues

  • Using --multi-thread-streams 1 increases the performance for large files copied to dCache.

#example command to upload a big file
rclone --timeout=240m  --multi-thread-streams 1 --config=[PROJECT_tokenfile].conf copy ./[SOURCE]/ [PROJECT_tokenfile]:[DESTINATION] -P

3.2.2.1.3. Event-driven processing

Events are useful when you want to know something you’re interested in happened in your dCache project space, such as when new data is available or when files are staged from tape, etc.

For debugging purposes, additional information is stored in your home directory under ~/.ada:

  • The channel names are stored in ~/.ada/channels/channel-name-XXXXX for reference

  • The channels in ~/.ada/channels/channel-status-XXXXXX store a number with the last event ID so that when a competing client takes over, the client uses this ID to resume missed events

  • Subscribe to changes in a given directory:

ada --tokenfile [PROJECT_tokenfile].conf --events changes-in-dir /[PROJECT_directory] --recursive
  • Check the available channels listening to events:

ada --tokenfile [PROJECT_tokenfile].conf --channels
  • Report staging events

When you start this channel, all files in the scope will be listed, including their locality and staging status. This allows your event handler to take actions, like starting jobs to process the files that are online. When all files have been listed, the command will keep listening and reporting all locality and staging changes.

ada --tokenfile [PROJECT_tokenfile].conf --report-staged staging-in-tape-dir /[PROJECT_directory] --recursive

3.2.2.1.4. Authentication

In this page we gave an extended example on using ADA with macaroons authentication. ADA can be used with multiple authentication options.

Authentication

ADA commands

When to use

Macaroon

ada --tokenfile <filename>

You don’t have direct access on dCache but you have a token from the project data manager that allows you certain permissions on the data

Username/password

ada --netrc [filename]

You have direct usr/pwd access credentials on dCache

X509 Certificate

ada --proxy [filename]

You have direct VO membership access on dCache

Here is an example of a .netrc file that you can create in your home to use username/password authentication:

$ cat ~/.netrc:
machine webdav.grid.surfsara.nl
login [your-ui-username]
password [your-ui-password]
machine dcacheview.grid.surfsara.nl
login [your-ui-username]
password [your-ui-password]

3.2.2.1.5. Run ADA anywhere

In this page we gave an extended example on using ADA on Spider. ADA is portable and can be used on any platform. On the Spider UIs ADA is already on board. If you want to interact with the dCache API and transfer files from your own machine then you need to install the following prerequisites:

  • jq: the only dependency for executing ada commands

  • rclone: the client to perform transfers (MacOS: brew install rclone)

As a Data manager if you wish to create macaroons from any platform, e.g. your local machine, then you need to install the following get-macaroon and view-macaroon scripts:

  • wget https://raw.githubusercontent.com/sara-nl/GridScripts/master/get-macaroon

  • wget https://raw.githubusercontent.com/sara-nl/GridScripts/master/view-macaroon

  • And their dependencies: pymacaroons, python3-html2text

3.2.2.1.6. ADA configuration files

The user specific configuration files are written in ~/.ada/

  1. The URL to query the API is stored in /etc/ada.conf (system default) or ~/.ada/ada.conf (user specific, optional)

  2. The bearer tokens information based on a tokenfile is stored in ~/.ada/headers/. The authorization_header is created for security to prevent from reading the token as argument and be displayed in ‘ps’ info. This way the token is read from a hidden file in the user home dir

  3. The Events information such as the last eventID is stored in ~/.ada/channels/