I’m following up to a question I submitted about a year ago for this very issue, in hope that someone might have some guidance!
Our group is seeing more users access our cluster via remote VSCode, meaning hosting the VSCode interface on our login nodes. In theory, this is not a problem and fully supported for code editing, job submission to the scheduler, and small tasks (<10min cpu-time).
However, this is not very useful for users - they really would like to get their remote VSCode hosted on a compute node to run compute-intensive jobs interactively. As our cluster is currently configured, requesting an interactive session from the command line in VSCode will bring only the terminal into the compute job. That terminal session will automatically ssh to the assigned node with a jobid and the requested resources tied to it. However, the actual remote VSCode interface remains hosted on the login node.
To make this slightly more complicated, adding a proxy-jump to the remote VSCode interface is possible but that creates a new ssh instance to the node. This second ssh connection of the interface is not recognized as part of a job or have any assigned resources. It will get killed by process reapers.
The simplest solution to this is using VSCode’s Remote Tunnels, which absolutely works. We however have security concerns about using this as it relies on an intermediate 3rd-part host (Azure) for the connection.
It’s not perfect, it doesn’t let you sync settings which some of our researchers view as a requirement, but it has lifted much of the burden off the login nodes.
As for your ssh connections not being recognized as part of a running job, how do you have SSH/Slurm/Cgroups configured? We use the “PAM adopt” feature to allow researchers to log into nodes where they have running jobs. These new connections get “adopted” into the appropriate cgroup so they don’t get killed.
Hi @langford - we do run Open OnDemand and have the CodeServer app. I believe the lack of support for GitHub Copilot on the third-party marketplace is a show-stopper for many users.
In addition to that, I’m trying to glean why users are inclined to use their remote VSCode instance rather than launching up a 1:1 instance from an interactive compute session (with full Microsoft marketplace support).
I’m not totally sure about how our ssh-jobs are configured, but it’s definitely not tied to the user but rather a particular session ID. So any additional ssh attempts by that user are distinct from the job and recognized as such. This is a good question I can go back to my sysadmin for clarification, thanks!
@mitchellxh On our RCI-HPC clusters at the University of Delaware we have experienced a similar situation and our systems person, Dr Jeff Frey, wrote a python application called vscode-shell-proxy to mitigate these issues. See GitHub - jtfrey/vscode-shell-proxy: An attempt at proxying vscode remote shell backend through cluster login nodes. for details on how this could be implemented on your clusters. Once installed, it requires each user to configure VSCode Remote-SSH extension in the application to use vscode-shell-proxy as the command to connect. When configured properly, our tests show using vscode-shell-proxy achieves the goals of
having access to specialized hardware by virtue of the parameters associated with the interactive job.
not consuming significant CPU resources on the login node.
being automatically terminated when the interactive job is completed.
Please see our client documentation on the appropriate use of RCI Clusters as a VSCode Backend at hpc documentation - software:vscode:vscode. Both of our clusters use Slurm, so looking at the specific documentation for each cluster may be helpful too, especially that the ‘vscode-shell-proxy’ depends on clients using a Stable Build VSCode on the local machine. DO NOT install the VSCode Insiders Build, as it will not work with the proxy.
@mitchellxh I explored this use case recently. Here is what I came up with. Any feedback is welcome.
Hit Ctrl+Shift+P and “Open SSH Configuration File” and drop this in there. Edit it as necessary, not only with the correct hostname, but also slurm command details (such as time).
Host my-hpc
HostName my-hpc.my-univ.edu
User YOUR_USERNAME
Host hpc-server-job
ForwardAgent yes
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
ProxyCommand ssh my-hpc "/usr/bin/salloc --nodes=1 --time=1-0:00:00 /bin/bash -c 'nc $SLURM_NODELIST 22'"
User YOUR_USERNAME
Now, you’ll use the hpc-server-job host to connect to and allocate the compute node. Let me know what you think.