How is storage performance for high I/O HPC jobs affected by running in the cloud?

We have some projects that include running calculations with high I/O on our university HPC cluster. We are considering moving them to cloud, probably AWS. I believe that the data to be used will reside in a data center in the cloud, suggesting that latency and bandwidth could possibly be affected by geological distances. Not to mention that properties and capacities on both the client side and the cloud storage end will most likely affect performance as observed by the user running the job.

What are the effects of the level of parallelization of the storage itself? Most likely this depends on the size of the chunks of data (objects) being requested (as well as the frequency). The client configuration; processing speed, memory, its own storage properties; must also influence I/O performance in the cloud. Does anyone have some recent numbers (and impressions) they could share?

Thank you!

Most people who mention cloud storage are referring to bucket storage (AWS S3). But you can’t run calculations from bucket storage, so you need to move all the data from bucket storage to the block storage (AWS EBS) of the instance (AWS EC2) running the calculation.

If you are doing parallel programming, you need to create a shared filesystem for the instances or use a managed one (‘AWS EFS’ a long time ago when I last knew something, but now they have FSx for Lustre).

So, if you were to look up just the cost of maintaining that persistent parallel file system in the cloud you would give up (or at least I would), and buy more local HPC storage, or use XSEDE, or use DOE machines.

Amazon, prove me wrong!

1 Like

You may want to take a look at AWS lambda function. For cost effective usage try to see how you can make use of ephemeral storage together with your lambda function. This blog will help you start with lambda and ephemeral storage. Ephemeral storage is really cost effective, I use it on my kubernetes cluster with Azure. I don’t have experience with AWS but the concept is the same.

Happy computing

Regarding recent numbers and impressions, it is difficult to provide a general answer, as performance can vary significantly depending on the specific application and use case. However, there are a number of benchmarking tools and studies that can be used to evaluate the performance of cloud storage systems, and it may be useful to consult these resources when making a decision about whether to move your projects to the cloud.

In one of my studies, we want to know what is the impact of storage parallelization and client configuration on I/O performance in cloud computing, and how can these factors be optimized for different types of applications and workloads? We are looking at this through a variety of approaches, such as benchmarking studies, simulations, or experimental investigations.