How to speed-up FreeSurfer's recon-all?

mitchellxh · May 15, 2023, 7:42pm

I work in a neuroimaging lab and we have many T1w MRI scans we’d like to process with the FreeSurfer recon-all processing stream. We also have access to an HPC with both GPU and CPU resources and would like to know how we can maximize processing speed performance. How can I reduce the individual runtime of a recon-all job?

mitchellxh · May 15, 2023, 7:53pm

FreeSurfer’s recon-all processing stream can be parallelized for the workflows on each hemisphere. This can significantly speed-up the individual subject runtime.

The flags needed to be added to the traditional recon-all command are:

-parallel
-openmp $N_cores

It’s important to set the $N_cores to the number of cores allocated to your job. There is typically diminishing returns on requesting more than 8 cores per subject, dependent on the size and availability of your HPC. For speed-testing of recon-all parallelization you can reference this table:
https://rcs.bu.edu/examples/imaging/freesurfer/appendix/

An example of the revised command for parallelization may look like:

recon-all -all -s $subject -parallel -openmp 8

jfossot · May 26, 2023, 4:15pm

This is really interesting take on workflow parallization, The approach taken by some other tools e.g Nextflow is to run in parallel as long as you define processes. How is this example similar or different from running in parallel on the HPC?

@mitchellxh thanks for bringing this up

mitchellxh · May 27, 2023, 5:16pm

Great question!

Within FreeSurfer, ‘recon-all’ is the command and oft-mentioned workflow for structural reconstruction of brain MRI. This processing stream is limited to only processing a single brain at-one-time. Therefore using traditional parallelization on a cluster is a perfect fit, that is, submitting 10’s to 100’s of ‘recon-all’ jobs at once.

A typical ‘recon-all’ job with 1-core will take 5-6 hours. This probably doesn’t matter to the researcher who needs to process 10’s to 100’s of brains. However, in cases where they may need to process a single brain (or have seemingly unlimited resources…) they can speed-up the individual runtimes with these flags.

Briefly, because ‘recon-all’ splits the brain into 2 hemispheres for reconstruction, it is forced to sequentially process the left-hemisphere and then the right-hemisphere. The developers recognized this bottleneck and added support for Open-MP framework. By simply adding the above flags and allocating across 4 cores (2-cores per hemisphere), the ‘recon-all’ job will now take 3-4 hours.

Hope this adds more context!
-Mitch