Ask.Cyberinfrastructure

What are fast ways to make bulk transfers?

COMMENT This question may not have a single answer, but answers that highlight different possible ways to achieve good data transfer rates in different situations could all be useful.

In all cases good transfer speeds will require that the end-point systems and the intermediate networks are configured appropriately. There is an extensive discussion of end-point configuration for fast data transfers here ( https://fasterdata.es.net/host-tuning/ ).

CURATOR: Chris Hill

One option is bbftp (http://software.in2p3.fr/bbftp/). It is used by a number of NASA locations. A typical bbftp transfer command requires software to be installed at both the sending and receiving location. The software can be installed by an end user. A typical bbftp command is

bbftp -s -E PATH-TO-BBFTPD-ON-REMOTE -R bbftprc -V -p 8 -r 5 -u USER -i LIST-OF-COMMANDS REMOTE-MACHINE

where

  • PATH-TO-BBFTPD-ON-REMOTE: is the location on the remote system of the bbftpd server command.
  • USER: is the user name on the remote system
  • REMOTE-MACHINE: is the remote system to transfer files to
  • LIST-OF-COMMANDS: is a file with a list of commands to perform a transfer e.g.

A typical LIST-OF-COMMANDS file contains entries like those shown here

setoption keepaccess
setoption keepmode
setoption nocreatedir
put /nobackupp8/cnhill1/hawaii_npac/0001171008_V_10800.8150.1_1080.3720.90 /nfs/cnhlab003/cnh/llc4320/incoming/hawaii_npac/
setoption keepaccess OK
setoption keepmode OK
setoption nocreatedir OK

grid-ftp is another option

globus is another option…

It is possible to script bulk uploads to facilities like Dropbox too. A set of scripts such as the ones here ( https://github.com/cpausmit/PyCox ) can be used to make large scale transfers from multiple end-points to achieve good throughput.

Aspera (http://asperasoft.com/software/) is another option. It is a commercial too, but is widely used in the bioinformatics community.

Amazon S3 provides another useful solution for intermediate staging of large files. The
article here ( https://arxiv.org/abs/1708.00544 ) provides a description of the mechanisms
and experience doing this.