Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32550

Reject the Following Connection Request in Rotator Controller

    XMLWordPrintable

    Details

      Description

      Allow the controller to be able to reject all the following connection request. By doing this, once there are no connections from TCP/IP clients anymore, the controller software will execute to end. The wrapper service in PXI controller will rerun the control software. The benefit of this is to clean all the drive related information contains the faults.

      This task will need to update the EUI and controller code.

        Attachments

          Issue Links

            Activity

            No builds found.
            ttsai Te-Wei Tsai created issue -
            ttsai Te-Wei Tsai made changes -
            Field Original Value New Value
            Summary Reject the Following Connection Request Reject the Following Connection Request in Rotator Controller
            ttsai Te-Wei Tsai made changes -
            Description Allow the controller to be able to reject all the following connection request. By doing this, once there are no connections from TCP/IP clients anymore, the controller software will execute to end. The wrapper service in PXI controller will rerun the control software. The benefit of this is to clean all the drive related information contains the faults. Allow the controller to be able to reject all the following connection request. By doing this, once there are no connections from TCP/IP clients anymore, the controller software will execute to end. The wrapper service in PXI controller will rerun the control software. The benefit of this is to clean all the drive related information contains the faults.

            This task will need to update the EUI and controller code.
            ttsai Te-Wei Tsai made changes -
            Link This issue relates to DM-32552 [ DM-32552 ]
            ttsai Te-Wei Tsai made changes -
            Sprint TSSW Sprint - Jan 03 - Jan 17 [ 1143 ]
            ttsai Te-Wei Tsai made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            ttsai Te-Wei Tsai made changes -
            Story Points 1
            ttsai Te-Wei Tsai made changes -
            Link This issue is triggering DM-33068 [ DM-33068 ]
            Hide
            ttsai Te-Wei Tsai added a comment -

            Tested the updated code in DM-33068.

            Show
            ttsai Te-Wei Tsai added a comment - Tested the updated code in DM-33068 .
            Show
            ttsai Te-Wei Tsai added a comment - Please help to review the PRs: 1. https://github.com/lsst-ts/ts_rotator_controller/pull/31 2. https://github.com/lsst-ts/ts_rotator_gui/pull/18 Thanks!
            ttsai Te-Wei Tsai made changes -
            Reviewers Russell Owen [ rowen ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            rowen Russell Owen added a comment -

            I asked Te-Wei why this exists. It's a way of allowing the "wrapper code" to restart, clearing leaked memory.

            Show
            rowen Russell Owen added a comment - I asked Te-Wei why this exists. It's a way of allowing the "wrapper code" to restart, clearing leaked memory.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            In additional to clearing the leaked memory, this remaps the PDO of Copley drive.

            Show
            ttsai Te-Wei Tsai added a comment - - edited In additional to clearing the leaked memory, this remaps the PDO of Copley drive.
            Hide
            rowen Russell Owen added a comment -

            My understanding of what's going on: The MT Hexapod low-level controller has a known memory leak that has proven difficult to track down. The practical solution, for now, is to restart the relevant bit of software, and the way to do that is to close all connections to the low-level controller. (I will call this "reboot" below, though in fact it is less drastic than that. Still, it's close enough in concept, since all clients have to disconnect.)

            The way this is done now is as follows:

            • Send the CSC to standby (so it disconnects)
            • Disconnect the EUI.

            The procedure with this change is:

            • Put the low-level controller into this "refuse new connections" mode.
            • Send the CSC to standby (so it disconnects)
            • Disconnect the EUI.

            My understanding of the risk you are addressing with this ticket is that somebody may re-enable the CSC before the EUI has been disconnected.

            I suggest withdrawing this change because I feel it introduces a serious operational risk. If the engineer leaves the low-level controller in "refuse new connections" mode then it is not safe to send the CSC to standby state.

            I also feel that the risk in the present reboot procedure is very small; much smaller than the risk that you introduce with this change.

            Surely "rebooting" must only be done after informing users and getting permission. We cannot simply disable the hexapod without warning others. Thus I don't think there is much risk of the CSC being re-enabled prematurely. And even if it does occur, the engineer should be monitoring the CSC state, so they should see this occur, and can deal with it (by sending it to standby again).

            Finally, I feel that this change is confusing. I don't think it is clear what this mode is for in the EUI. This increases the risk that the low-level controller may be switched into this mode and left there.

            If reboot is needed fairly often then please consider adding a "reboot" command to the low-level controller and a "reboot" button to the EUI that disconnects all clients. This is robust and much simpler than the current procedure. This will send the CSC to fault state. But as long as users have been warned, I don't think that is a problem. There is no way to "reboot" without doing something with the CSC – either disconnecting it in advance or letting it go to fault when it loses the connection. Either way it has to be recovered afterwards.

            If you still feel you want to pursue this change then please bring Tiago Ribeiro in.

            Show
            rowen Russell Owen added a comment - My understanding of what's going on: The MT Hexapod low-level controller has a known memory leak that has proven difficult to track down. The practical solution, for now, is to restart the relevant bit of software, and the way to do that is to close all connections to the low-level controller. (I will call this "reboot" below, though in fact it is less drastic than that. Still, it's close enough in concept, since all clients have to disconnect.) The way this is done now is as follows: Send the CSC to standby (so it disconnects) Disconnect the EUI. The procedure with this change is: Put the low-level controller into this "refuse new connections" mode. Send the CSC to standby (so it disconnects) Disconnect the EUI. My understanding of the risk you are addressing with this ticket is that somebody may re-enable the CSC before the EUI has been disconnected. I suggest withdrawing this change because I feel it introduces a serious operational risk. If the engineer leaves the low-level controller in "refuse new connections" mode then it is not safe to send the CSC to standby state. I also feel that the risk in the present reboot procedure is very small; much smaller than the risk that you introduce with this change. Surely "rebooting" must only be done after informing users and getting permission. We cannot simply disable the hexapod without warning others. Thus I don't think there is much risk of the CSC being re-enabled prematurely. And even if it does occur, the engineer should be monitoring the CSC state, so they should see this occur, and can deal with it (by sending it to standby again). Finally, I feel that this change is confusing. I don't think it is clear what this mode is for in the EUI. This increases the risk that the low-level controller may be switched into this mode and left there. If reboot is needed fairly often then please consider adding a "reboot" command to the low-level controller and a "reboot" button to the EUI that disconnects all clients. This is robust and much simpler than the current procedure. This will send the CSC to fault state. But as long as users have been warned, I don't think that is a problem. There is no way to "reboot" without doing something with the CSC – either disconnecting it in advance or letting it go to fault when it loses the connection. Either way it has to be recovered afterwards. If you still feel you want to pursue this change then please bring Tiago Ribeiro in.
            rowen Russell Owen made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            ttsai Te-Wei Tsai added a comment -

            Since I will unify the code of GUI and DDS (which means both of them allow the multiple connections), the reboot is fine to me if the CSC user does not care about the CSC will enter the Fault state after the EUI user decides to reboot the system. To be honest, I think the main arguments here is the design logic instead of the risks in the above comments.

            Show
            ttsai Te-Wei Tsai added a comment - Since I will unify the code of GUI and DDS (which means both of them allow the multiple connections), the reboot is fine to me if the CSC user does not care about the CSC will enter the Fault state after the EUI user decides to reboot the system. To be honest, I think the main arguments here is the design logic instead of the risks in the above comments.
            ttsai Te-Wei Tsai made changes -
            Status Reviewed [ 10101 ] In Review [ 10004 ]
            ttsai Te-Wei Tsai made changes -
            Status In Review [ 10004 ] In Progress [ 3 ]
            Hide
            ttsai Te-Wei Tsai added a comment -

            After the discussion with Russell, we agreed to change the implementation to disconnect the CSC connection directly.

            Show
            ttsai Te-Wei Tsai added a comment - After the discussion with Russell, we agreed to change the implementation to disconnect the CSC connection directly.
            ttsai Te-Wei Tsai made changes -
            Story Points 1 2
            ttsai Te-Wei Tsai made changes -
            Attachment askToReboot.png [ 55791 ]
            Hide
            ttsai Te-Wei Tsai added a comment -

            The GUI will prompt a new window to confirm the reboot of controller:

            Show
            ttsai Te-Wei Tsai added a comment - The GUI will prompt a new window to confirm the reboot of controller:
            Hide
            ttsai Te-Wei Tsai added a comment -

            Please see the update of PR. Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - Please see the update of PR. Thanks!
            ttsai Te-Wei Tsai made changes -
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            rowen Russell Owen added a comment -

            Reviewed on github. Looks very nice. I'm impressed with how simple you were able to make it.

            Show
            rowen Russell Owen added a comment - Reviewed on github. Looks very nice. I'm impressed with how simple you were able to make it.
            rowen Russell Owen made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            ttsai Te-Wei Tsai made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]
            ttsai Te-Wei Tsai made changes -
            Link This issue relates to DM-32266 [ DM-32266 ]

              People

              Assignee:
              ttsai Te-Wei Tsai
              Reporter:
              ttsai Te-Wei Tsai
              Reviewers:
              Russell Owen
              Watchers:
              Russell Owen, Te-Wei Tsai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.