How can I quickly run a LLM on the cluster?

How can I quickly run a LLM on the cluster?

CPU

  1. allocate a full node for 4 hours (adjust as you see fit)
    srun -p public -n 64 -N 1 --mem=0 -t 4:00:00 --pty /bin/bash

GPU

  1. allocate a GPU for 4 hours (adjust as you see fit)
    srun -p gpu-a100 -n 32 -N 1 --mem=128G -t 4:00:00 --pty /bin/bash

  2. module load apptainer

  3. get the container (only needed the first time)
    apptainer pull docker://ollama/ollama

  4. run the server in the background
    apptainer run ollama_latest.sif &

  5. run the model interactively (there is a list of models here that you can pick from GitHub - ollama/ollama: Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.)
    apptainer run ollama_latest.sif run llama3.1:8b "$(cat /PATH/TO/YOUR/FILE.csv)" please summarize this data