Remote VSCode on Compute Nodes

mitchellxh · March 21, 2024, 11:36am

I’m following up to a question I submitted about a year ago for this very issue, in hope that someone might have some guidance!

Our group is seeing more users access our cluster via remote VSCode, meaning hosting the VSCode interface on our login nodes. In theory, this is not a problem and fully supported for code editing, job submission to the scheduler, and small tasks (<10min cpu-time).

However, this is not very useful for users - they really would like to get their remote VSCode hosted on a compute node to run compute-intensive jobs interactively. As our cluster is currently configured, requesting an interactive session from the command line in VSCode will bring only the terminal into the compute job. That terminal session will automatically ssh to the assigned node with a jobid and the requested resources tied to it. However, the actual remote VSCode interface remains hosted on the login node.

To make this slightly more complicated, adding a proxy-jump to the remote VSCode interface is possible but that creates a new ssh instance to the node. This second ssh connection of the interface is not recognized as part of a job or have any assigned resources. It will get killed by process reapers.

The simplest solution to this is using VSCode’s Remote Tunnels, which absolutely works. We however have security concerns about using this as it relies on an intermediate 3rd-part host (Azure) for the connection.

Any and all recommendations are welcome!

langford · March 21, 2024, 2:17pm

Hi @mitchellxh Does your center run Open OnDemand? We use the CodeServer OOD app (VSCode Showcase - General Discussion - Open OnDemand) which provides a VSCode instance in a web-browser that runs directly on a compute node.

It’s not perfect, it doesn’t let you sync settings which some of our researchers view as a requirement, but it has lifted much of the burden off the login nodes.

As for your ssh connections not being recognized as part of a running job, how do you have SSH/Slurm/Cgroups configured? We use the “PAM adopt” feature to allow researchers to log into nodes where they have running jobs. These new connections get “adopted” into the appropriate cgroup so they don’t get killed.

mitchellxh · March 21, 2024, 3:32pm

Hi @langford - we do run Open OnDemand and have the CodeServer app. I believe the lack of support for GitHub Copilot on the third-party marketplace is a show-stopper for many users.

In addition to that, I’m trying to glean why users are inclined to use their remote VSCode instance rather than launching up a 1:1 instance from an interactive compute session (with full Microsoft marketplace support).

I’m not totally sure about how our ssh-jobs are configured, but it’s definitely not tied to the user but rather a particular session ID. So any additional ssh attempts by that user are distinct from the job and recognized as such. This is a good question I can go back to my sysadmin for clarification, thanks!

anita · March 22, 2024, 4:13pm

@mitchellxh On our RCI-HPC clusters at the University of Delaware we have experienced a similar situation and our systems person, Dr Jeff Frey, wrote a python application called vscode-shell-proxy to mitigate these issues. See GitHub - jtfrey/vscode-shell-proxy: An attempt at proxying vscode remote shell backend through cluster login nodes. for details on how this could be implemented on your clusters. Once installed, it requires each user to configure VSCode Remote-SSH extension in the application to use vscode-shell-proxy as the command to connect. When configured properly, our tests show using vscode-shell-proxy achieves the goals of

having access to specialized hardware by virtue of the parameters associated with the interactive job.
not consuming significant CPU resources on the login node.
being automatically terminated when the interactive job is completed.

Please see our client documentation on the appropriate use of RCI Clusters as a VSCode Backend at hpc documentation - software:vscode:vscode. Both of our clusters use Slurm, so looking at the specific documentation for each cluster may be helpful too, especially that the ‘vscode-shell-proxy’ depends on clients using a Stable Build VSCode on the local machine. DO NOT install the VSCode Insiders Build, as it will not work with the proxy.

lampinap · April 17, 2024, 4:23pm

@mitchellxh I explored this use case recently. Here is what I came up with. Any feedback is welcome.

Hit Ctrl+Shift+P and “Open SSH Configuration File” and drop this in there. Edit it as necessary, not only with the correct hostname, but also slurm command details (such as time).

Host my-hpc
    HostName my-hpc.my-univ.edu  
    User YOUR_USERNAME 

Host hpc-server-job  
    ForwardAgent yes  
    StrictHostKeyChecking no  
    UserKnownHostsFile=/dev/null  
    ProxyCommand ssh my-hpc "/usr/bin/salloc --nodes=1 --time=1-0:00:00 /bin/bash -c 'nc $SLURM_NODELIST 22'"  
    User YOUR_USERNAME

Now, you’ll use the hpc-server-job host to connect to and allocate the compute node. Let me know what you think.