How to effectively and robustly track researcher metadata for shared directories?

wwarr · May 26, 2023, 6:31pm

At University of Alabama at Birmingham (UAB), we provide shared directories in our GPFS and S3 storage to facilitate collaboration between researchers. Historically the use case for these directories has been 99.5% uniform: a Principal Investigator (PI) wants to provide a collaborative space for their lab, and maybe some external collaborators.

We’re starting to see a substantially broader set of use case requests, and want to effectively track metadata about those use cases. Things like

administrative and/or commercial collaborative entities
parent entity of the responsible party
whether it is for a lab, or an institutional admin division, or a NIH P30 core facility, etc.
which people are data stewards, if any
upper-bound of regulatory requirements of data within the space (IRB, electronic health record [EHR] derived data, FERPA, etc.)
anything else…?

How do other institutions manage this information robustly? Is it possible to store this data as some sort of “tag” within GPFS? This could be done (at the level of buckets) in S3.