3.2.2.1. ADA interface
Our ADA (Advanced dCache API) interface is based on the dCache API and the webdav protocol to access and process your data on dCache from any platform and with various authentication methods. In the example below, we will use macaroons as authentication method.
macaroons are a token based authentication method supported by dCache. Macaroons can be used to give access to dCache data in a very granular way. This enables data managers to autonomously share their data in dCache without having to reach out to SURF to request access.
In addition to ADA, rclone is used for data transfers from/to dCache. rclone is a webdav client that supports by default 4 parallel streams of data, and is installed on Spider.
A quick start up guide for ADA is captured in the video below:
3.2.2.1.1. Browser view
dCache storage can be viewed both through the ADA tools or through the browser using the web client. The browser view is available only for Data managers, and is just an additional way to explore the storage space.
As a Data manager you have direct credentials on dCache and it is possible to access the browser view using your SURFcua credentials in the following link:
https://webdav-secure.grid.surfsara.nl/pnfs/grid.sara.nl/data/[PROJECT]/
Note
You may be asked for a browser certificate, just select cancel and you will be asked for your credentials. These are the same credentials used for logging in to the SURF CUA portal in Section 2.1)
3.2.2.1.2. Using ADA
ADA is a wrapper of tools created by SURF to simplify your interactions with dCache. ADA wraps operations that can be performed directly on the dCache REST API, such as listing or deleting files and directories. ADA wraps all of this functionality into one clean package saving you the hassle of having to download and troubleshoot multiple packages and dependencies. ADA does not support uploading and downloading data, for this you need to use rclone. To simplify matters, ADA and rclone can use the same config file. Both ADA and rclone are installed on Spider.
This section provides examples and the steps to start using ADA to interact with your dCache storage.
3.2.2.1.2.1. Create a macaroon
Requirements: credentials to dCache
username/pwd or
x509 proxy
Spider role: Data manager
Action: create a macaroon
Output: rclone config file [PROJECT_tokenfile].conf. You can share this file with any member of the project in next step.
Description: the DM creates a macaroon for a shared directory (including the sub-directories & files). In the next step he will share the macaroon with the project team in a non-public space, either user’s home directories, or the ‘shared’ or ‘data’ project space directories.
Example:
get-macaroon \
--url https://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/[PROJECT] \
--duration P7D \
--chroot \
--user [USERNAME] \
--permissions DOWNLOAD,UPLOAD,DELETE,MANAGE,LIST,READ_METADATA,UPDATE_METADATA \
--ip [IP RANGE] \
--output rclone [PROJECT_tokenfile]
You will be asked for your CUA password after submitting this command. This example creates a macaroon that is valid for 7 days for the given url. The argument chroot ensures that the url is taken as the root directory when the macaroon is used later.
The following permissions can be given comma-separated upon creation of the macaroon:
Permission |
Function |
---|---|
DOWNLOAD |
Read a file |
UPLOAD |
Write a file |
DELETE |
Delete a file or directory |
MANAGE |
Rename or move a file or directory |
LIST |
List objects in a directory |
READ_METADATA |
Read file status |
UPDATE_METADATA |
Stage/unstage a file, change QoS |
You can explore the other commandline arguments with get-macaroon --help
.
3.2.2.1.2.3. Inspect the macaroon
Requirements: the rclone config file [PROJECT_tokenfile].conf
Spider role: normal user
Actions: view macaroon
Output: the list of activities and directories that you can use on dCache
Example:
# Your macaroon is the value of 'bearer_token'
$ cat [PROJECT_tokenfile].conf
[tokenfile]
type = webdav
bearer_token = MDAxY2xvY2F0aWXXXXXXXXXXXXXXXX
url = https://webdav.grid.surfsara.nl:2880/
vendor = other
user =
password =
#View the macaroon details
$ view-macaroon [PROJECT_tokenfile].conf
location Optional.empty
identifier NDFXzXXX
cid iid:03FXXX//
cid id:39147;35932,30013;[Data Manager Name]
cid before:2020-02-05T11:01:11.577Z
cid home:/[Project folder]
cid root:/[Project folder]
cid activity:DOWNLOAD,UPLOAD,MANAGE,LIST
signature fefef25a4973e59b10ad464054dXXXXXXX
3.2.2.1.2.4. Use the macaroon
This section describes how to work with your files.
Requirements: the rclone config file [PROJECT_tokenfile].conf. For ADA this is referred to as tokenfile.
Spider role: normal user
Tip
You can use an environment variable to set the tokenfile, rather than having to pass it on the command line every time. Enter the command:$export ada_tokenfile=/path-to-mytoken/[PROJECT_tokenfile].conf
and then you can omit the option ‘–tokenfile’ from all of the ADA commands.
Tip
You can get extra information about the submitted command and the REST API call details by using the –debug option in your ADA command.
3.2.2.1.2.4.1. Check your access to the system
--whoami
Action: request authentication details
Output: information about the token owner and permissions
Example:
ada --tokenfile [PROJECT_tokenfile].conf --whoami
{
"status": "AUTHENTICATED",
"uid": 515XX,
"gids": [
511XX
],
"username": "[Data Manager name]",
"rootDirectory": "/pnfs/grid.sara.nl/data/[Project]/disk",
"homeDirectory": "/"
}
3.2.2.1.2.4.2. Listing files
--list <directory>
--longlist <file|directory>
--longlist --from-file <file-list>
Action: list files or directories
Output: list or long-list of the files from the directory that the macaroon allows permission for
Example:
ada --tokenfile [PROJECT_tokenfile].conf --longlist /[DIRECTORY]
Note that because we added the commandline argument chroot when creating the macaroon, we do not need to specify the full url to the directory on dCache.
3.2.2.1.2.4.3. Get file or directory details
--stat <file|directory>
Action: show all details of a file or directory
Output: metadata information
Example:
ada --tokenfile [PROJECT_tokenfile].conf --stat /[FILE or DIRECTORY]
3.2.2.1.2.4.4. Create a directory on dCache
--mkdir <directory>
Action: create directories
Output: new directory created
Example:
ada --tokenfile [PROJECT_tokenfile].conf --mkdir /[DIRECTORY]
3.2.2.1.2.4.5. Moving or renaming files
--mv <file|directory> <destination>
Action: Move file or directory. This can be used as an option also to rename a directory if the move is done in the same directory. Specify the path and name to the source and target directory
Output: File or Directory moved to a different dCache location or renamed
Example:
ada --tokenfile [PROJECT_tokenfile].conf --mv /[SOURCE] /[DESTINATION]
3.2.2.1.2.4.6. Recursively remove folders
--delete <file|directory> [--recursive [--force]]
Action: delete files or directories
Output: file or Directory is deleted
Recursive deletion: to recursively delete a directory and ALL of its contents, add
--recursive
. You will need to confirm deletion of each subdir, unless you add--force
.Alternative: rclone purge
Example:
ada --tokenfile [PROJECT_tokenfile].conf --delete /[FILE or DIRECTORY]
ada --tokenfile [PROJECT_tokenfile].conf --delete /[FILE or DIRECTORY] --recursive
ada --tokenfile [PROJECT_tokenfile].conf --delete /[DIRECTORY] --recursive --force
# alternative
$ rclone --config=[PROJECT_tokenfile].conf purge [PROJECT_tokenfile]:[FILE or DIRECTORY]
3.2.2.1.2.4.7. Checksum
--checksum <file>
--checksum <directory>
--checksum --from-file <file-list>
Action: get the checksum of a files or files inside a directory or list of files
Output: show MD5/Adler32 checksums for files
Example:
ada --tokenfile [PROJECT_tokenfile].conf --checksum /[FILE or DIRECTORY]
# create a filelist and get checksums for files in it
ada --tokenfile [PROJECT_tokenfile].conf --list /disk/mydir > files-to-checksum
sed -i -e 's/^/\/disk\/mydir\//' files-to-checksum
ada --tokenfile [PROJECT_tokenfile].conf --checksum --from-file files-to-checksum
#/disk/file1 ADLER32=80690001
#/disk/file2 ADLER32=80690001
#/disk/file3 ADLER32=80690001
3.2.2.1.2.4.8. View your usage
Action: get your storage usage with rclone
Example:
rclone --config=[PROJECT_tokenfile].conf size [PROJECT_tokenfile]:/
3.2.2.1.2.4.9. Staging
The dCache storage at SURF consists of magnetic tape storage and hard disk storage. If your quota allocation includes tape storage, then the data stored on magnetic tape has to be copied to a hard drive before it can be used. This action is called ‘staging files’ or ‘bringing a file online’. ADA supports bulk staging which significantly improves performance compared to staging files one by one.
The files remain online as long as there is free space on the disk pools. When a pool group is full (maximum of assigned quota on staging area) and free space is needed, dCache will purge the least recently used cached files. The tape replica will remain on tape.
The amount of time that a file is requested to stay on disk is called pin lifetime. The file will not be purged until the pin lifetime has expired. You can specify the pin lifetime with the argument –lifetime in your staging commands. The pin lifetime can be set to SECONDS, MINUTES, HOURS or DAYS. If –lifetime is not given, default is 7 DAYS.
For each staging request a reference is added in a log file in your home directory. The log file can be found in ` ~/.ada/requests.log` and it saves the request IDs, target paths and stage request timestamps.
Your macaroon needs to be created with UPDATE_METADATA permissions to allow for staging operations.
--stage <file>
--stage <directory>
--stage --from-file <file-list>
Action: stage a file from tape or files in directory or a list of files (restore, bring it online)
Output: the file or list of files comes online on disk
Example:
#list files to get the status
ada --tokenfile [PROJECT_tokenfile].conf --longlist /[PROJECT_tape_dir]
#file1 1186443 2020-02-13 16:27 UTC tape NEARLINE
#file2 1635 2018-10-24 15:34 UTC tape NEARLINE
#stage a single file
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/file1
#stage a single file with pin lifetime two weeks
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/file1 --lifetime 14D
#stage a directory (optionally recursively with --recursive)
ada --tokenfile [PROJECT_tokenfile].conf --stage /[PROJECT_tape_dir]/dirname/
#stage a list of files
ada --tokenfile [PROJECT_tokenfile].conf --stage --from-file files-to-stage
3.2.2.1.2.4.10. Unstaging
Your macaroon needs to be created with UPDATE_METADATA permissions to allow for unstaging operations.
For each unstaging request a reference is added in a log file in your home directory. The log file can be found in ` ~/.ada/requests.log` and it saves the request IDs, target paths and unstage request timestamps.
--unstage <file>
--unstage <directory>
--unstage --from-file <file-list>
Action: unstage/release a file from tape or files in directory or a list of files
Output: the file or list of files is unstaged and may be removed for the disk any time so dCache may purge its online replica.
#unstage a single file
ada --tokenfile [PROJECT_tokenfile].conf --unstage /[PROJECT_tape_dir]/file1
# unstage dir (optionally recursively with --recursive)
ada --tokenfile [PROJECT_tokenfile].conf --unstage /[PROJECT_tape_dir]/dirname/
#unstage a list of files
ada --tokenfile [PROJECT_tokenfile].conf --unstage --from-file files-to-unstage
3.2.2.1.2.5. Transfer Data
In order to transfer files from/to dCache we use the same [PROJECT_tokenfile].conf and the rclone client to trigger webdav transfers as shown below.
3.2.2.1.2.5.1. Copy data from dCache
rclone --config=[PROJECT_tokenfile].conf copy [PROJECT_tokenfile]:/[SOURCE] ./[DESTINATION] -P
Example, copy an existing test folder to Spider:
rclone --config=[PROJECT_tokenfile].conf copy [PROJECT_tokenfile]:/tests/ ./tests/ -P
3.2.2.1.2.5.2. Write data to dCache
rclone --config=[PROJECT_tokenfile].conf copy ./[SOURCE]/ [PROJECT_tokenfile]:[DESTINATION] -P
Notes on data transfers:
The rclone
copy
mode will just copy new/changed files. The rclonesync
(one way) mode will create a directory identical to the source so be careful because this can cause data loss. We suggest you to test first with the–dry-run
flag to see exactly what would be copied and deleted.You can increase the number of parallel transfers with the
--transfers [Number]
option.When copying a small number of files into a large destination you can add the
--no-traverse option
in the rclone copy command for controlling whether rclone lists the destination directory or not. This can speed transfers up greatly.If you are certain that none of the destination files exists you can add the
--no-check-dest option
in the rclone copy command to speed up the transfers.For very large files it is important to set the
–timeout
option high enough. As a rule of thumb, set it to 10 minutes for every GB of the biggest file in a collection. This may look ridiculously large, but it provides a safe margin to avoid problems with timeout issuesUsing
--multi-thread-streams 1
increases the performance for large files copied to dCache.
#example command to upload a big file
rclone --timeout=240m --multi-thread-streams 1 --config=[PROJECT_tokenfile].conf copy ./[SOURCE]/ [PROJECT_tokenfile]:[DESTINATION] -P
3.2.2.1.3. Event-driven processing
Events are useful when you want to know something you’re interested in happened in your dCache project space, such as when new data is available or when files are staged from tape, etc.
For debugging purposes, additional information is stored in your home directory under ~/.ada:
The channel names are stored in ~/.ada/channels/channel-name-XXXXX for reference
The channels in ~/.ada/channels/channel-status-XXXXXX store a number with the last event ID so that when a competing client takes over, the client uses this ID to resume missed events
Subscribe to changes in a given directory:
ada --tokenfile [PROJECT_tokenfile].conf --events changes-in-dir /[PROJECT_directory] --recursive
Check the available channels listening to events:
ada --tokenfile [PROJECT_tokenfile].conf --channels
Report staging events
When you start this channel, all files in the scope will be listed, including their locality and staging status. This allows your event handler to take actions, like starting jobs to process the files that are online. When all files have been listed, the command will keep listening and reporting all locality and staging changes.
ada --tokenfile [PROJECT_tokenfile].conf --report-staged staging-in-tape-dir /[PROJECT_directory] --recursive
3.2.2.1.4. Authentication
In this page we gave an extended example on using ADA with macaroons authentication. ADA can be used with multiple authentication options.
Authentication |
ADA commands |
When to use |
---|---|---|
Macaroon |
|
You don’t have direct access on dCache but you have a token from the project data manager that allows you certain permissions on the data |
Username/password |
|
You have direct usr/pwd access credentials on dCache |
X509 Certificate |
|
You have direct VO membership access on dCache |
Here is an example of a .netrc file that you can create in your home to use username/password authentication:
$ cat ~/.netrc:
machine webdav.grid.surfsara.nl
login [your-ui-username]
password [your-ui-password]
machine dcacheview.grid.surfsara.nl
login [your-ui-username]
password [your-ui-password]
3.2.2.1.5. Run ADA anywhere
In this page we gave an extended example on using ADA on Spider. ADA is portable and can be used on any platform. On the Spider UIs ADA is already on board. If you want to interact with the dCache API and transfer files from your own machine then you need to install the following prerequisites:
jq
: the only dependency for executing ada commandsrclone
: the client to perform transfers (MacOS: brew install rclone)
As a Data manager if you wish to create macaroons from any platform, e.g. your local machine, then you need to install the following get-macaroon and view-macaroon scripts:
wget https://raw.githubusercontent.com/sara-nl/GridScripts/master/get-macaroon
wget https://raw.githubusercontent.com/sara-nl/GridScripts/master/view-macaroon
And their dependencies:
pymacaroons, python3-html2text
3.2.2.1.6. ADA configuration files
The user specific configuration files are written in ~/.ada/
The URL to query the API is stored in /etc/ada.conf (system default) or ~/.ada/ada.conf (user specific, optional)
The bearer tokens information based on a tokenfile is stored in ~/.ada/headers/. The authorization_header is created for security to prevent from reading the token as argument and be displayed in ‘ps’ info. This way the token is read from a hidden file in the user home dir
The Events information such as the last eventID is stored in ~/.ada/channels/