How to effectively and robustly track researcher metadata for shared directories?

At University of Alabama at Birmingham (UAB), we provide shared directories in our GPFS and S3 storage to facilitate collaboration between researchers. Historically the use case for these directories has been 99.5% uniform: a Principal Investigator (PI) wants to provide a collaborative space for their lab, and maybe some external collaborators.

We’re starting to see a substantially broader set of use case requests, and want to effectively track metadata about those use cases. Things like

  • administrative and/or commercial collaborative entities
  • parent entity of the responsible party
  • whether it is for a lab, or an institutional admin division, or a NIH P30 core facility, etc.
  • which people are data stewards, if any
  • upper-bound of regulatory requirements of data within the space (IRB, electronic health record [EHR] derived data, FERPA, etc.)
  • anything else…?

How do other institutions manage this information robustly? Is it possible to store this data as some sort of “tag” within GPFS? This could be done (at the level of buckets) in S3.