Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Currently ssh access to worker nodes on the verification cluster (lsst-verify-worker*) is prohibited. I would like to change this policy to allow users of the verification cluster to ssh to worker nodes. It is very useful to be able to inspect running processes in situ to determine how multiple jobs are being packed onto the nodes.
This might be dangerous in scenarios where the users are not expected to behave themselves, but we have the luxury of having a relatively small and considerate user base for this specific resource. For these reasons, it seems relatively safe to allow this access.
If the developer batch queues were to be supported by a large batch "commons", it might be difficult to continue this kind of direct ssh access. But I don't think that's an argument against this RFC; instead, I think it's an argument against merging the developer batch with the commons. I can see reasons (e.g. testing of new versions of the batch system itself) why a separate batch system for development/developers would be desirable. On the other hand, it would also limit the maximum resources available to developers and the possibility for maximizing efficiency of compute resource utilization.