Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-423

Allow ssh access to verification cluster worker nodes

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      Currently ssh access to worker nodes on the verification cluster (lsst-verify-worker*) is prohibited. I would like to change this policy to allow users of the verification cluster to ssh to worker nodes. It is very useful to be able to inspect running processes in situ to determine how multiple jobs are being packed onto the nodes.

      This might be dangerous in scenarios where the users are not expected to behave themselves, but we have the luxury of having a relatively small and considerate user base for this specific resource. For these reasons, it seems relatively safe to allow this access.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment - - edited

            If the developer batch queues were to be supported by a large batch "commons", it might be difficult to continue this kind of direct ssh access. But I don't think that's an argument against this RFC; instead, I think it's an argument against merging the developer batch with the commons. I can see reasons (e.g. testing of new versions of the batch system itself) why a separate batch system for development/developers would be desirable. On the other hand, it would also limit the maximum resources available to developers and the possibility for maximizing efficiency of compute resource utilization.

            Show
            ktl Kian-Tat Lim added a comment - - edited If the developer batch queues were to be supported by a large batch "commons", it might be difficult to continue this kind of direct ssh access. But I don't think that's an argument against this RFC; instead, I think it's an argument against merging the developer batch with the commons. I can see reasons (e.g. testing of new versions of the batch system itself) why a separate batch system for development/developers would be desirable. On the other hand, it would also limit the maximum resources available to developers and the possibility for maximizing efficiency of compute resource utilization.
            Hide
            price Paul Price added a comment -

            I've found it useful to be able to ssh into nodes on the Princeton clusters. It lets you poke around your processes when they're running, make sure they're behaving. It would be useful to have the same ability on the verification cluster.

            Show
            price Paul Price added a comment - I've found it useful to be able to ssh into nodes on the Princeton clusters. It lets you poke around your processes when they're running, make sure they're behaving. It would be useful to have the same ability on the verification cluster.
            Hide
            Parejkoj John Parejko added a comment -

            +1 This would be very useful for debugging and job monitoring.

            Show
            Parejkoj John Parejko added a comment - +1 This would be very useful for debugging and job monitoring.
            Hide
            tjenness Tim Jenness added a comment -

            Simon Krughoff do you need a comment from NCSA before adopting?

            Show
            tjenness Tim Jenness added a comment - Simon Krughoff do you need a comment from NCSA before adopting?
            Hide
            krughoff Simon Krughoff added a comment -

            I think I do. Bill Glick [X] or Paul Domagala [X] can you comment on whether the LDF would be willing to support this RFC, if adopted?

            Show
            krughoff Simon Krughoff added a comment - I think I do. Bill Glick [X] or Paul Domagala [X] can you comment on whether the LDF would be willing to support this RFC, if adopted?
            Hide
            petravick Donald Petravick added a comment -

            Paul is away this week.  Here is my initial thinking.  I think it is useful for development, as long was we write a policy in the developers's guide that SSH usage must not interfere with the resource management of the batch system.  I;e you cannot get a grip on batch node unless you rware working on the executing instance of software on that node, and ssh must cease when the batch job ends (or something like that).

             

            Show
            petravick Donald Petravick added a comment - Paul is away this week.  Here is my initial thinking.  I think it is useful for development, as long was we write a policy in the developers's guide that SSH usage must not interfere with the resource management of the batch system.  I;e you cannot get a grip on batch node unless you rware working on the executing instance of software on that node, and ssh must cease when the batch job ends (or something like that).  
            Hide
            krughoff Simon Krughoff added a comment -

            Excellent! Thanks Donald Petravick. I'll mark adopted and create an IHS ticket for implementation.

            Show
            krughoff Simon Krughoff added a comment - Excellent! Thanks Donald Petravick . I'll mark adopted and create an IHS ticket for implementation.
            Hide
            krughoff Simon Krughoff added a comment -

            IHS-691 is the triggered issue.

            Show
            krughoff Simon Krughoff added a comment - IHS-691 is the triggered issue.

              People

              Assignee:
              krughoff Simon Krughoff
              Reporter:
              krughoff Simon Krughoff
              Watchers:
              Donald Petravick, John Parejko, John Swinbank, Joshua Hoblitt, Kian-Tat Lim, Paul Price, Simon Krughoff, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.