Resources for working with Snakemake on HPCs

When I started my project on building a Snakemake workflow for a bioinformatics pipeline, I
wasn’t super familiar with Snakemake and how it worked. One resource that I found to be very helpful when first learning Snakemake was this video series, which went over many of the basics of how Snakemake works and some good practices for building workflows. In addition to this, the Snakemake documentation is generally pretty good, and I was able to use that to learn how the package worked as well.

One particular challenge I ran into in my project was trying to run a job multiple times on different files in parallel and to solve that problem, I was able to use checkpoint rules as a solution to run all of these jobs in parallel. This section in the Snakemake documentation explains how checkpoint rules evaluate the output of a particular rule and allow the user to base their next rules off of that output.

Snakemake is an interesting tool to manage workflows, especially when the tasks can be chained by input/output files. I find this tutorial also very helpful Snakemake Tutorial

I would be curious to hear what made you choose to build your workflow in Snakemake over other workflow managers, namely Nextflow? My understanding is that both are very common solutions for building workflows in bioinformatics.

For me, Snakemake was better because many people in my lab have worked with Snakemake before. I’ve also done a lot of work with python so that helped with learning the package a lot. I’ve never worked with Nextflow so I can’t really offer a good comparison, but I was happy with Snakemake.