Ask.Cyberinfrastructure

How can I use passwordless ssh in an HPC environment?

hpc-getting-started
site-specific
ssh

#1

I need to login to the cluster many times a day and transfer some files using the scp command.
How can I save my password so I do not need to enter it every time I login to the cluster?
I’ve heard that there is a way to use “passwordless” login.

CURATOR: Katia Oleinik


#3

ANSWER: As katia said, this will be very site dependent. And answers might vary
depending on whether you are refering to passwordless ssh between the
login nodes and the compute nodes (on which you have a job running), or to
passwordless ssh between assorted workstations you use and the login nodes of
the HPC cluster.

Most sites probably have some mechanism already set up for the former, as many
schedulers use a passwordless ssh to initiate jobs on the compute nodes on your
behalf. Therefore, I am focussing on the latter case.

I would expect that most sites which allow for passwordless ssh from arbitrary
workstations to the login nodes use ssh’s builtin public key authentication
method. Basically, you generate an asymmetric key pair, and copy the “public”
key to the login nodes of the HPC cluster (if the login nodes share a common
home directory, you should only need to copy the public key to a single
login node) and is “activated”. The “private” key stays on your workstation.
After this, any user in possession of a copy of the private can login to the
HPC cluster as you — PROTECT YOUR PRIVATE KEY!.

There are many websites with detailed instructions on how to do this, e.g.
http://www.ssh.com/ssh/public-key-authentication
https://kb.iu.edu/d/aews

Basically,

  1. Generate the public/private key pair on your workstation with a command like
    “ssh-keygen -t rsa”
  2. Copy the public key to the login node(s), either with a command like
    “ssh-copy-id YOUR_USERNAME@LOGIN_NODE_HOSTNAME” or using scp.
  3. If you used something like scp for the former step, you need to manually
    append the contents of the id_rsa.pub public key file to ~/.ssh/authorized_keys
    on the login node.

#2

ANSWER: This would depend on the cluster…
The Shared Computing Cluster at Boston University does not allow for this.
We can give some general guidance about creating public and private keys (there are quite a few webpages talking about this out there), but my guess every cluster has its own rules and recommendations.


#4

I can answer the question reflected in the title (which is slightly different from the post itself) but I’d guess a user might look here for both. For easier ssh, a trick I use for our Stanford clusters, along with a script that can help you to set it up. If you look in the “hosts” folder of the forward tool here, you’ll see simple scripts to set up a configuration.

Where does a configuration go?

You have a folder, .ssh in your $HOME that is like a safe keeping bucket for credentials. This is where you could find public and private keys, along with a file called ~/.ssh/config where you can write named configurations for each resource you connect to.

How can I generate one?

For example, let’s look at this script for the Sherlock cluster.

wget https://raw.githubusercontent.com/vsoch/forward/master/hosts/sherlock_ssh.sh

Make it executable

chmod u+x sherlock_ssh.sh

And run it. The one thing it will ask us for is our username:

./sherlock_ssh.sh
Sherlock username > pancake
Host sherlock
    User pancake
    Hostname sh-ln05.stanford.edu
    GSSAPIDelegateCredentials yes
    GSSAPIAuthentication yes
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p

You’ll see it spits out the configuration to the command line after I enter my username, so the portion from the top at “Host: sherlock” to the bottom you would want to add to ~/.ssh/config. If that file doesn’t even exist for you, then you can just pipe into it directly.

/bin/bash sherlock_ssh.sh > ~/.ssh/config

What do the components mean?

As mentioned in other threads, the configuration itself is going to vary based on your cluster! For example, sherlock has (or at the time of the writing has) 8 login nodes. So the script itself starts by selecting a random number between 1 and 8, and that becomes your node. The reason I chose to do this is so I can issue multiple commands in a row but only need to authenticate for one session from a terminal. Here is what the script looks like:

#!/bin/bash
#
# Sherlock cluster at Stanford
# Prints an ssh configuration for the user, selecting a login node at random
# Sample usage: bash sherlock_ssh.sh
echo
read -p "Sherlock username > "  USERNAME

# Randomly select login node from 1..4
LOGIN_NODE=$((1 + RANDOM % 8))

echo "Host sherlock
    User ${USERNAME}
    Hostname sh-ln0${LOGIN_NODE}.stanford.edu
    GSSAPIDelegateCredentials yes
    GSSAPIAuthentication yes
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p"

The Control* and GSSAPI* parameters (perhaps someone can add more comment on these) help with enduring the session and also security, and tis would vary based on your cluster. I would also do a good search for ssh configurations and look at all the cool things you can set up!

How do I use it?

Once it’s set up, the cool part is you can interact with your resource just via the host name, the first name (Host: sherlock). So for example, I can issue commands in a row (and only need to authenticate for the first one).

$ ssh sherlock ls
$ ssh sherlock "ls SCRIPT"
bash
brainbehavior
matlab
python
R

How do I have multiple?

Have multiple just comes down to adding another (named) host in the file! For example:

Host sherlock
    User pancakes
    Hostname sh-ln06.stanford.edu
    GSSAPIDelegateCredentials yes
    GSSAPIAuthentication yes
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p

Host farmshare
    User peas
    Hostname rice.stanford.edu
    GSSAPIDelegateCredentials yes
    GSSAPIAuthentication yes
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p