I’ve installed PyTorch in an Anaconda environment but can’t find the GPU. What is going wrong, why, and how can I fix it?
There are two common reasons a GPU can’t be found when using PyTorch.
- Be sure you’ve requested a GPU.
- On Cheaha you will need to use the flags
--partition=pascalnodesor--partition=pascalnodes-mediumand--gres=gpu:1for a single GPU. - With Open OnDemand Jupyter Notebook, you will need to select the
pascalnodesorpascalnodesmedium partition. At this time exactly one GPU is automatically requested. It is not possible to request more.
- On Cheaha you will need to use the flags
- Be sure you’ve installed the GPU flavor of PyTorch. Please see https://docs.rc.uab.edu/cheaha/slurm/gpu/#pytorch-compatibility for more information about how to install PyTorch with GPU compatibility.
Make sure that you have installed PyTorch following the guide on their website (Start Locally | PyTorch). For CUDA-aware PyTorch they recommend using:
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
which should pull from the pytorch and nvidia channels to get the right packages.
First, I would check that you are on a node with a GPU assuming it is a Nvidia GPU.
type in the command: nvidia-smi (check the CUDA version and make sure it is the same you used during pytorch installation)
Second, I would check with the conda list command that the pytorch version is not a cpu version.
In Addition to that. Always check for the device being used by the application. Whether Tensorflow or Pythorch:
In Pytorch, you can list number of available GPUs using torch.cuda.device_count().
- If you specify cpu as a device such as
torch.device("cpu"), this means all available CPUs/cores and memory will be used in the computation of tensors. - If you specify cuda as a device such as
torch.device("cuda"), it is the same astorch.device("cuda:0")which is the first GPU and memory. - Otherwise, specify which GPU you want using
torch.device(f"cuda:{i}")for the ith device.