Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32550

Reject the Following Connection Request in Rotator Controller

    XMLWordPrintable

    Details

      Description

      Allow the controller to be able to reject all the following connection request. By doing this, once there are no connections from TCP/IP clients anymore, the controller software will execute to end. The wrapper service in PXI controller will rerun the control software. The benefit of this is to clean all the drive related information contains the faults.

      This task will need to update the EUI and controller code.

        Attachments

          Issue Links

            Activity

            Hide
            ttsai Te-Wei Tsai added a comment -

            Tested the updated code in DM-33068.

            Show
            ttsai Te-Wei Tsai added a comment - Tested the updated code in DM-33068 .
            Show
            ttsai Te-Wei Tsai added a comment - Please help to review the PRs: 1. https://github.com/lsst-ts/ts_rotator_controller/pull/31 2. https://github.com/lsst-ts/ts_rotator_gui/pull/18 Thanks!
            Hide
            rowen Russell Owen added a comment -

            I asked Te-Wei why this exists. It's a way of allowing the "wrapper code" to restart, clearing leaked memory.

            Show
            rowen Russell Owen added a comment - I asked Te-Wei why this exists. It's a way of allowing the "wrapper code" to restart, clearing leaked memory.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            In additional to clearing the leaked memory, this remaps the PDO of Copley drive.

            Show
            ttsai Te-Wei Tsai added a comment - - edited In additional to clearing the leaked memory, this remaps the PDO of Copley drive.
            Hide
            rowen Russell Owen added a comment -

            My understanding of what's going on: The MT Hexapod low-level controller has a known memory leak that has proven difficult to track down. The practical solution, for now, is to restart the relevant bit of software, and the way to do that is to close all connections to the low-level controller. (I will call this "reboot" below, though in fact it is less drastic than that. Still, it's close enough in concept, since all clients have to disconnect.)

            The way this is done now is as follows:

            • Send the CSC to standby (so it disconnects)
            • Disconnect the EUI.

            The procedure with this change is:

            • Put the low-level controller into this "refuse new connections" mode.
            • Send the CSC to standby (so it disconnects)
            • Disconnect the EUI.

            My understanding of the risk you are addressing with this ticket is that somebody may re-enable the CSC before the EUI has been disconnected.

            I suggest withdrawing this change because I feel it introduces a serious operational risk. If the engineer leaves the low-level controller in "refuse new connections" mode then it is not safe to send the CSC to standby state.

            I also feel that the risk in the present reboot procedure is very small; much smaller than the risk that you introduce with this change.

            Surely "rebooting" must only be done after informing users and getting permission. We cannot simply disable the hexapod without warning others. Thus I don't think there is much risk of the CSC being re-enabled prematurely. And even if it does occur, the engineer should be monitoring the CSC state, so they should see this occur, and can deal with it (by sending it to standby again).

            Finally, I feel that this change is confusing. I don't think it is clear what this mode is for in the EUI. This increases the risk that the low-level controller may be switched into this mode and left there.

            If reboot is needed fairly often then please consider adding a "reboot" command to the low-level controller and a "reboot" button to the EUI that disconnects all clients. This is robust and much simpler than the current procedure. This will send the CSC to fault state. But as long as users have been warned, I don't think that is a problem. There is no way to "reboot" without doing something with the CSC – either disconnecting it in advance or letting it go to fault when it loses the connection. Either way it has to be recovered afterwards.

            If you still feel you want to pursue this change then please bring Tiago Ribeiro in.

            Show
            rowen Russell Owen added a comment - My understanding of what's going on: The MT Hexapod low-level controller has a known memory leak that has proven difficult to track down. The practical solution, for now, is to restart the relevant bit of software, and the way to do that is to close all connections to the low-level controller. (I will call this "reboot" below, though in fact it is less drastic than that. Still, it's close enough in concept, since all clients have to disconnect.) The way this is done now is as follows: Send the CSC to standby (so it disconnects) Disconnect the EUI. The procedure with this change is: Put the low-level controller into this "refuse new connections" mode. Send the CSC to standby (so it disconnects) Disconnect the EUI. My understanding of the risk you are addressing with this ticket is that somebody may re-enable the CSC before the EUI has been disconnected. I suggest withdrawing this change because I feel it introduces a serious operational risk. If the engineer leaves the low-level controller in "refuse new connections" mode then it is not safe to send the CSC to standby state. I also feel that the risk in the present reboot procedure is very small; much smaller than the risk that you introduce with this change. Surely "rebooting" must only be done after informing users and getting permission. We cannot simply disable the hexapod without warning others. Thus I don't think there is much risk of the CSC being re-enabled prematurely. And even if it does occur, the engineer should be monitoring the CSC state, so they should see this occur, and can deal with it (by sending it to standby again). Finally, I feel that this change is confusing. I don't think it is clear what this mode is for in the EUI. This increases the risk that the low-level controller may be switched into this mode and left there. If reboot is needed fairly often then please consider adding a "reboot" command to the low-level controller and a "reboot" button to the EUI that disconnects all clients. This is robust and much simpler than the current procedure. This will send the CSC to fault state. But as long as users have been warned, I don't think that is a problem. There is no way to "reboot" without doing something with the CSC – either disconnecting it in advance or letting it go to fault when it loses the connection. Either way it has to be recovered afterwards. If you still feel you want to pursue this change then please bring Tiago Ribeiro in.
            Hide
            ttsai Te-Wei Tsai added a comment -

            Since I will unify the code of GUI and DDS (which means both of them allow the multiple connections), the reboot is fine to me if the CSC user does not care about the CSC will enter the Fault state after the EUI user decides to reboot the system. To be honest, I think the main arguments here is the design logic instead of the risks in the above comments.

            Show
            ttsai Te-Wei Tsai added a comment - Since I will unify the code of GUI and DDS (which means both of them allow the multiple connections), the reboot is fine to me if the CSC user does not care about the CSC will enter the Fault state after the EUI user decides to reboot the system. To be honest, I think the main arguments here is the design logic instead of the risks in the above comments.
            Hide
            ttsai Te-Wei Tsai added a comment -

            After the discussion with Russell, we agreed to change the implementation to disconnect the CSC connection directly.

            Show
            ttsai Te-Wei Tsai added a comment - After the discussion with Russell, we agreed to change the implementation to disconnect the CSC connection directly.
            Hide
            ttsai Te-Wei Tsai added a comment -

            The GUI will prompt a new window to confirm the reboot of controller:

            Show
            ttsai Te-Wei Tsai added a comment - The GUI will prompt a new window to confirm the reboot of controller:
            Hide
            ttsai Te-Wei Tsai added a comment -

            Please see the update of PR. Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - Please see the update of PR. Thanks!
            Hide
            rowen Russell Owen added a comment -

            Reviewed on github. Looks very nice. I'm impressed with how simple you were able to make it.

            Show
            rowen Russell Owen added a comment - Reviewed on github. Looks very nice. I'm impressed with how simple you were able to make it.

              People

              Assignee:
              ttsai Te-Wei Tsai
              Reporter:
              ttsai Te-Wei Tsai
              Reviewers:
              Russell Owen
              Watchers:
              Russell Owen, Te-Wei Tsai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.