What sort of approaches work well for providing encrypted storage for research computing environments?


Research groups often have data that is in some way sensitive, for example, data containing gambling information, data containing educational interventions (concerns minors), or medical data. For example a state agency may contract a research group to carry on some analyses on some regulated activity (e.g. gambling). Even when de-identified (i.e. each person assigned a unique random identifier by the regulating agency), such data is sensitive and needs to be protected, both in transit and at rest. Such data can also be very large - in excess of 1Tb - concerning millions of subjects and billions of events, so the analysis may not fit in a single machine.

So here are some more concrete questions:

  1. What software do you use to encrypt data? Pros and cons of each package? Availability, cost, ease of use, compatibility with analyzing software?
  2. Are there encrypted solutions on the hardware level? Pros and cons of hardware vs. software encryption? Cost, speed, etc?

Curator:Kristina Plazonic


I highly recommend Jonathan Crabtree’s talk “Impact: Infrastructure for Privacy-Assured Computations” talk given on 2019-04-23 as part of the Topics in Research Data Management webinar series hosted by Texas Digital Library. Here’s the YouTube link, which is also posted at


Jon is blogging about Project ImPACT (Infrastructure for Privacy-Assured CompuTations) at

Full disclosure that I’m a developer for Dataverse, which is part of the architecture. I’ll include an image below and from the Dataverse perspective, we are tracking this ongoing work at