Transfer of large datasets files

Hello everyone,

I am currently working on a project that involves transferring large files to end users. The files are too big to be sent through on-premise FTP server, so I am exploring other options for file transfer.

I am wondering if anyone on this forum has experience with transferring large files to end users and could offer some advice on the best practices, tools, or platforms that I can use to ensure a smooth and secure transfer process.

Specifically, I am looking for solutions that can accommodate files that are several gigabytes in size and that can support both one-time and recurring transfers.

I appreciate any insights or suggestions that you can provide, and I look forward to hearing from the experts on this forum.

Thank you in advance for your help!

Best regards,
Shakhizat

Hi, Shakhizat,
We use Globus for data transfer between different endpoints, from supercomputer to pc or any other end of machine. The Globus is very popularly used, also in the ACCESS machines.
Here is the site:
https://www.globus.org/data-transfer

Best,

4 Likes

I agree with @Xiaoqin_Huang that Globus is a nice, modern solution for large file transfers. (We recently added Globus support to Dataverse.)

Another old school but reliable option is rsync.

2 Likes

Hello @pdurbin, @Xiaoqin_Huang, thank you for your detailed response. I really appreciate it. To further clarify, we were considering purchasing Dropbox and also using Google Drive simultaneously. However, both platforms have restrictions on file size and upload/transfer, which has led us to look into others alternatives. I’ve just looked into Global, but was unable to find the pricing for a subscription. In your opinion, do you think that Global is a superior option to SaaS-based cloud solutions like Dropbox and Google Drive? Currently, our solution is an on-premise based FTP server (vsftpd), which mainly functions outside our control and restricted by enterprise firewall.

Globus solves a different problem than using Dropbox or Google Drive. Globus only provides the solution to transfer and authentication, while storage must come from another service provider.
You might look at repurposing that FTP server into a Globus Endpoint.

Dropbox and Google Drive are really more for consumer-sized files and documents. There’s also the chance to run into issues with changes in terms and conditions (like the recent change from unlimited academic Google Drives to limits). Dropbox and Google Drive are also not necessarily performant and it’s even possible to run into issues with throttling if a particular user is downloading a large amount of data.

It’s possible to use Google Drive, Microsoft OneDrive, Google Cloud, AWS S3, and some other storage systems with the addition of Globus Premium Connectors. These are an addition to the base license.
As far as licensing goes, you will have to talk with someone to get details, but a fee-trial is available.

We have used Globus for files that are multiple terabytes, and it works well for that. With Globus Connect Server v5, there is an option to provide HTTP access for users who do not want to install a local Globus Connect Personal or have access to a Globus Connect-enabled server to transfer the data to.

There are a lot of possibilities to automate workflows using the Globus SDK. I’ve written an application to automate the creation of Globus GuestCollections for one of our large data-producing groups at my institution.

1 Like

+1 for Chis’s answer.

Hi there,

Here are some potential solutions that you may want to consider for transferring large files securely to end users:

  1. Cloud-based file transfer services: There are many cloud-based file transfer services, such as Dropbox, Google Drive, and OneDrive, that allow users to securely upload and share large files with others. These services typically have features such as password protection, expiration dates, and permissions management to ensure that files are transferred securely and only accessed by authorized individuals.
  2. File transfer protocols: There are several file transfer protocols that you may want to consider, such as SFTP, SCP, and HTTPS. These protocols provide secure and reliable transfer of large files over the internet.
  3. Managed file transfer (MFT) solutions: MFT solutions are designed to provide secure and reliable file transfers, especially for large files. These solutions often include features such as encryption, compression, and automated file transfers.
  4. Content delivery networks (CDNs): CDNs are designed to optimize content delivery for websites and other online applications, but they can also be used for large file transfers. CDNs have a network of servers around the world that can be used to distribute large files quickly and efficiently.
  5. Aspera: Aspera is a high-speed file transfer technology that can transfer large files up to 100 times faster than traditional methods. Aspera uses a patented technology that optimizes the transfer of large files over the internet.

When selecting a solution, consider factors such as security, ease of use, scalability, and cost. Additionally, it may be helpful to conduct a proof-of-concept or pilot test to ensure that the solution meets your specific requirements.

I hope this helps!