My understanding of what's going on: The MT Hexapod low-level controller has a known memory leak that has proven difficult to track down. The practical solution, for now, is to restart the relevant bit of software, and the way to do that is to close all connections to the low-level controller. (I will call this "reboot" below, though in fact it is less drastic than that. Still, it's close enough in concept, since all clients have to disconnect.)
The way this is done now is as follows:
- Send the CSC to standby (so it disconnects)
- Disconnect the EUI.
The procedure with this change is:
- Put the low-level controller into this "refuse new connections" mode.
- Send the CSC to standby (so it disconnects)
- Disconnect the EUI.
My understanding of the risk you are addressing with this ticket is that somebody may re-enable the CSC before the EUI has been disconnected.
I suggest withdrawing this change because I feel it introduces a serious operational risk. If the engineer leaves the low-level controller in "refuse new connections" mode then it is not safe to send the CSC to standby state.
I also feel that the risk in the present reboot procedure is very small; much smaller than the risk that you introduce with this change.
Surely "rebooting" must only be done after informing users and getting permission. We cannot simply disable the hexapod without warning others. Thus I don't think there is much risk of the CSC being re-enabled prematurely. And even if it does occur, the engineer should be monitoring the CSC state, so they should see this occur, and can deal with it (by sending it to standby again).
Finally, I feel that this change is confusing. I don't think it is clear what this mode is for in the EUI. This increases the risk that the low-level controller may be switched into this mode and left there.
If reboot is needed fairly often then please consider adding a "reboot" command to the low-level controller and a "reboot" button to the EUI that disconnects all clients. This is robust and much simpler than the current procedure. This will send the CSC to fault state. But as long as users have been warned, I don't think that is a problem. There is no way to "reboot" without doing something with the CSC – either disconnecting it in advance or letting it go to fault when it loses the connection. Either way it has to be recovered afterwards.
If you still feel you want to pursue this change then please bring Tiago Ribeiro in.
Tested the updated code in
DM-33068.