Dear Cyberinfrastructure community,
As a newbie in HPC storage solutions, I would appreciate your recommendations on how to segregate a 126 TB Lustre-based parallel file system storage.
• Are /home/, /project/, and /scratch/ sufficient for typical needs of AI/ML workloads?
• We currently store large datasets locally. Where is the best place to store them? Should we use
SquashFS to store them on the parallel file system? How should we store datasets with folders containing millions of files? Is it efficient to store them on the Lustre-based parallel file system?
• Can we locate the home file system on the parallel file system, or should we use a dedicated file system like NFS?
• How can we implement purging of the scratch file system? Can we use a cron-based script to delete three folders?
• How do people typically implement quota limits for disk space and number of files? Is there a solution to implement this automatically?
• For what purpose the local SSD disks of computer nodes can be used? We have an intention to use it as cachefilesd. What do you think about it?
I appreciate any insights or suggestions you can provide. I look forward to hearing from the experts on this forum.
Thank you in advance for your help!
Best regards,
Shakhizat