How can I quickly run a LLM on the cluster?

lampinap · September 27, 2024, 11:15pm

lampinap · October 25, 2024, 9:41pm

allocate a full node for 4 hours (adjust as you see fit)
srun -p public -n 64 -N 1 --mem=0 -t 4:00:00 --pty /bin/bash

allocate a GPU for 4 hours (adjust as you see fit)
srun -p gpu-a100 -n 32 -N 1 --mem=128G -t 4:00:00 --pty /bin/bash
module load apptainer
get the container (only needed the first time)
apptainer pull docker://ollama/ollama
run the server in the background
apptainer run ollama_latest.sif &
run the model interactively (there is a list of models here that you can pick from GitHub - ollama/ollama: Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.)
apptainer run ollama_latest.sif run llama3.1:8b "$(cat /PATH/TO/YOUR/FILE.csv)" please summarize this data