What are fast ways to make bulk transfers?

christophernhill · March 16, 2018, 1:22am

COMMENT This question may not have a single answer, but answers that highlight different possible ways to achieve good data transfer rates in different situations could all be useful.

In all cases good transfer speeds will require that the end-point systems and the intermediate networks are configured appropriately. There is an extensive discussion of end-point configuration for fast data transfers here ( https://fasterdata.es.net/host-tuning/ ).

CURATOR: Chris Hill

christophernhill · March 29, 2018, 3:04pm

One option is bbftp (http://software.in2p3.fr/bbftp/). It is used by a number of NASA locations. A typical bbftp transfer command requires software to be installed at both the sending and receiving location. The software can be installed by an end user. A typical bbftp command is

bbftp -s -E PATH-TO-BBFTPD-ON-REMOTE -R bbftprc -V -p 8 -r 5 -u USER -i LIST-OF-COMMANDS REMOTE-MACHINE

where

PATH-TO-BBFTPD-ON-REMOTE: is the location on the remote system of the bbftpd server command.
USER: is the user name on the remote system
REMOTE-MACHINE: is the remote system to transfer files to
LIST-OF-COMMANDS: is a file with a list of commands to perform a transfer e.g.

A typical LIST-OF-COMMANDS file contains entries like those shown here

setoption keepaccess
setoption keepmode
setoption nocreatedir
put /nobackupp8/cnhill1/hawaii_npac/0001171008_V_10800.8150.1_1080.3720.90 /nfs/cnhlab003/cnh/llc4320/incoming/hawaii_npac/
setoption keepaccess OK
setoption keepmode OK
setoption nocreatedir OK

christophernhill · March 29, 2018, 3:16pm

grid-ftp is another option

christophernhill · March 29, 2018, 3:16pm

globus is another option…

christophernhill · March 29, 2018, 3:26pm

It is possible to script bulk uploads to facilities like Dropbox too. A set of scripts such as the ones here ( https://github.com/cpausmit/PyCox ) can be used to make large scale transfers from multiple end-points to achieve good throughput.

christophernhill · March 29, 2018, 4:05pm

Aspera (http://asperasoft.com/software/) is another option. It is a commercial too, but is widely used in the bioinformatics community.

christophernhill · March 29, 2018, 4:14pm

Amazon S3 provides another useful solution for intermediate staging of large files. The
article here ( https://arxiv.org/abs/1708.00544 ) provides a description of the mechanisms
and experience doing this.

rberger · June 28, 2019, 11:15pm

Besides the software solutions to this problem, there are other means to achieve good data transfer rates for bulk transfers. It can be cheaper and faster to use the “FedEx protocol”. Take a bunch of disks and mail them. The latency is not so great, but you can’t beat the bandwidth.

tru · July 9, 2019, 3:29pm

http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html maybe?