Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28687

The MTHexapod low-level controller mis-handles clearError if there is a fault condition

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Sprint:
      TSSW Sprint - Jun 21 - Jul 05, TSSW Sprint - Jul 05 - Jul 19
    • Team:
      Telescope and Site
    • Urgent?:
      No

      Description

      If there is a fault condition such as an e-stop pressed then the clearError command oscillates between the "FAULT" and "STANDBY" state forever. I think it should got to FAULT and stay in FAULT.

      I observed this in today's testing. I believe the sequence was:

      • Use the EUI to put the controller into DDS mode
      • Use the hexapod commander to issue "enterControl" to the CSC
      • The state goes to FAULT
      • Use the hexapod commander to issue "clearError" to the CSC
      • The state oscillates between FAULT and STANDBY. I see no sign that it will ever stop doing this.

      That's the core of the problem above. But we continued as follows:

      • Use the hexapod commander to issue "exitControl" to the CSC. The oscillation stopped.
      • Use the EUI to issue "clearError". The oscillation resumed (as reported to the commander).

        Attachments

        1. dataInEfd.png
          dataInEfd.png
          513 kB
        2. enabledStateWithInvalidCommand.png
          enabledStateWithInvalidCommand.png
          417 kB
        3. IMG_20210712_140632.jpg
          IMG_20210712_140632.jpg
          618 kB
        4. stateMachine.png
          stateMachine.png
          91 kB
        5. stateTransition.png
          stateTransition.png
          431 kB

          Issue Links

            Activity

            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Based on the log file, the following messages were recoded:

            Feb  5 11:06:12 localhost journal: LSST Wrapper: new mode= 4.000000
            Feb  5 11:06:12 localhost journal: LSST Wrapper: new mode= 0.000000
            Feb  5 11:06:12 localhost journal: LSST Wrapper: new mode= 4.000000
            Feb  5 11:06:12 localhost journal: LSST Wrapper: new mode= 0.000000
            Feb  5 11:06:12 localhost journal: LSST Wrapper: STO interlock open
            Feb  5 11:05:47 localhost journal: LSST Wrapper: new mode= 4.000000
            Feb  5 11:05:47 localhost journal: LSST Wrapper: new mode= 0.000000
            Feb  5 11:05:32 localhost journal: LSST Wrapper: Command source now == DDS
            

            There is the continuous oscillation of modes of 0 and 4. Interesting thing is "STO interlock open".

            There is no difference of clearing error code in hexapod controller between the EUI and DDS.

            Show
            ttsai Te-Wei Tsai added a comment - - edited Based on the log file, the following messages were recoded: Feb 5 11:06:12 localhost journal: LSST Wrapper: new mode= 4.000000 Feb 5 11:06:12 localhost journal: LSST Wrapper: new mode= 0.000000 Feb 5 11:06:12 localhost journal: LSST Wrapper: new mode= 4.000000 Feb 5 11:06:12 localhost journal: LSST Wrapper: new mode= 0.000000 Feb 5 11:06:12 localhost journal: LSST Wrapper: STO interlock open Feb 5 11:05:47 localhost journal: LSST Wrapper: new mode= 4.000000 Feb 5 11:05:47 localhost journal: LSST Wrapper: new mode= 0.000000 Feb 5 11:05:32 localhost journal: LSST Wrapper: Command source now == DDS There is the continuous oscillation of modes of 0 and 4. Interesting thing is "STO interlock open". There is no difference of clearing error code in hexapod controller between the EUI and DDS.
            Hide
            rowen Russell Owen added a comment -

            Is "STO interlock open" simply the indication that an e-stop was pressed? If so I agree it is interesting that it was reported during this flood of mode=4/mode=0 since that did not change.

            Show
            rowen Russell Owen added a comment - Is "STO interlock open" simply the indication that an e-stop was pressed? If so I agree it is interesting that it was reported during this flood of mode=4/mode=0 since that did not change.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Based on the following code (in drive.c), it looks like the interlock was on (or open) in the test:

                if (i < 2) {
                    if ((din->copleyInputPins & INTERLOCK_MASK) != lastInterlockPins) {
                        // is there an open interlock circuit?
                        if ((din->copleyInputPins & INTERLOCK_MASK) != INTERLOCK_MASK) {
                            if (!gInterlockOpen) {
                                syslog(LOG_ERR, "%s", "STO interlock open");
                                gInterlockOpen = 1;
                            }
                        } else {
                            if (gInterlockOpen)
                                syslog(LOG_NOTICE, "STO circuit OK");
                            gInterlockOpen = 0;
                        }
                    }
                }
            

            Assume the interlock is opened, the system would be in the Fault state. When we try to clear the error, the system should go to the Standby state first. Then, the drive.c in the low-level controller would find the interlock opens, and put the system back to the Fault state. This means we should only see Fault -> Standby -> Fault instead of continuous oscillation.

            Therefore, it looks like for somehow, the ClearError in low-level controller is triggered continuously. At each time after the system transitions to the Standby state, the existed interlock opening transitions the system back to the Fault state.

            Show
            ttsai Te-Wei Tsai added a comment - - edited Based on the following code (in drive.c ), it looks like the interlock was on (or open) in the test: if (i < 2) { if ((din->copleyInputPins & INTERLOCK_MASK) != lastInterlockPins) { // is there an open interlock circuit? if ((din->copleyInputPins & INTERLOCK_MASK) != INTERLOCK_MASK) { if (!gInterlockOpen) { syslog(LOG_ERR, "%s" , "STO interlock open" ); gInterlockOpen = 1; } } else { if (gInterlockOpen) syslog(LOG_NOTICE, "STO circuit OK" ); gInterlockOpen = 0; } } } Assume the interlock is opened, the system would be in the Fault state. When we try to clear the error, the system should go to the Standby state first. Then, the drive.c in the low-level controller would find the interlock opens, and put the system back to the Fault state. This means we should only see Fault -> Standby -> Fault instead of continuous oscillation. Therefore, it looks like for somehow, the ClearError in low-level controller is triggered continuously. At each time after the system transitions to the Standby state, the existed interlock opening transitions the system back to the Fault state.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            I think the problem might come from the read and write of cmdMsgBuffer and gCmdMsgBuffInd among thread (ddsCmdSocket.c, cmdClientSocket.c, and commanding.c).

            Show
            ttsai Te-Wei Tsai added a comment - - edited I think the problem might come from the read and write of cmdMsgBuffer and gCmdMsgBuffInd among thread ( ddsCmdSocket.c , cmdClientSocket.c , and commanding.c ).
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            The DDS and GUI can not command the Camera Hexapod in the same time. Therefore, there should be no race condition between ddsCmdSocket.c and cmdClientSocket.c.

            In the commanding.c, the code will increase the gCmdMsgBuffInd by itself in some conditions. However, it does not check the limit of buffer (cmdMsgBuffer). Therefore, I think this might be the reason to have some unpredicted behavior happened.

            Show
            ttsai Te-Wei Tsai added a comment - - edited The DDS and GUI can not command the Camera Hexapod in the same time. Therefore, there should be no race condition between  ddsCmdSocket.c and cmdClientSocket.c . In the commanding.c , the code will increase the gCmdMsgBuffInd by itself in some conditions. However, it does not check the limit of buffer ( cmdMsgBuffer ). Therefore, I think this might be the reason to have some unpredicted behavior happened.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Added a thread-safe circular buffer for the command. Replaced the command message buffer to use the thread-safe queue.

            Show
            ttsai Te-Wei Tsai added a comment - - edited Added a thread-safe circular buffer for the command. Replaced the command message buffer to use the thread-safe queue.
            Hide
            ttsai Te-Wei Tsai added a comment -

            Upgraded the software on summit and the state transition by GUI can work correctly:

            For the log message:

            Jul  9 18:48:07 localhost journal: LSST Wrapper: new mode= 2.000000
            Jul  9 18:48:07 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=5, parameter1 = 2.000000.
            Jul  9 18:47:52 localhost journal: LSST Wrapper: new mode= 1.000000
            Jul  9 18:47:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 1.000000.
            Jul  9 18:47:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 4.000000.
            Jul  9 18:47:35 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul  9 18:47:35 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000.
            

            Show
            ttsai Te-Wei Tsai added a comment - Upgraded the software on summit and the state transition by GUI can work correctly: For the log message: Jul 9 18:48:07 localhost journal: LSST Wrapper: new mode= 2.000000 Jul 9 18:48:07 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=5, parameter1 = 2.000000. Jul 9 18:47:52 localhost journal: LSST Wrapper: new mode= 1.000000 Jul 9 18:47:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 1.000000. Jul 9 18:47:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 4.000000. Jul 9 18:47:35 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 9 18:47:35 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000.
            Hide
            ttsai Te-Wei Tsai added a comment -

            Please help to review the PR:
            https://github.com/lsst-ts/ts_hexapod_controller/pull/19

            Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - Please help to review the PR: https://github.com/lsst-ts/ts_hexapod_controller/pull/19 Thanks!
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Holger Drass did the test on summit today and he could not clear the interlock fault. Before we checked with the hardware engineer the interlock button is pressed or not, we tried to clear the error by the software. And we could see the oscillation of system between the standby state and fault state. This is very strange because the the thread-safe circular buffer is used already. Therefore, there should be no overflow or race condition now. I began to suspect this comes from the Simulink model:

            ul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 12 16:23:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000.
            Jul 12 16:23:52 localhost journal: LSST Wrapper: STO interlock open
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            --
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 12 16:23:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000.
            Jul 12 16:23:52 localhost journal: LSST Wrapper: STO interlock open
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000
            

            Update from Holger:
            Mario helped me to clear the interlock fault.
            The bottom was not pressed.
            He needed to clear the interlock and then send a reset through a two button connected to the hexapod electronics cabinet.

            Show
            ttsai Te-Wei Tsai added a comment - - edited Holger Drass did the test on summit today and he could not clear the interlock fault. Before we checked with the hardware engineer the interlock button is pressed or not, we tried to clear the error by the software. And we could see the oscillation of system between the standby state and fault state. This is very strange because the the thread-safe circular buffer is used already. Therefore, there should be no overflow or race condition now. I began to suspect this comes from the Simulink model: ul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: Clearing drive faults. Jul 12 16:23:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000. Jul 12 16:23:52 localhost journal: LSST Wrapper: STO interlock open Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 -- Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: Clearing drive faults. Jul 12 16:23:52 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000. Jul 12 16:23:52 localhost journal: LSST Wrapper: STO interlock open Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:23:52 localhost journal: LSST Wrapper: new mode= 4.000000 Update from Holger: Mario helped me to clear the interlock fault. The bottom was not pressed. He needed to clear the interlock and then send a reset through a two button connected to the hexapod electronics cabinet.
            Hide
            ttsai Te-Wei Tsai added a comment -

            I closed the original PR first because I suspected there is the problem of Simulink model as well.

            Show
            ttsai Te-Wei Tsai added a comment - I closed the original PR first because I suspected there is the problem of Simulink model as well.
            Hide
            hdrass Holger Drass added a comment -

            Thanks Mario Rivera! Pictures on the buttons added.

            Show
            hdrass Holger Drass added a comment - Thanks Mario Rivera ! Pictures on the buttons added.
            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Checked the starting point of oscillation of states:

            Jul 12 16:10:46 localhost systemd: Starting Stop Read-Ahead Data Collection...
            Jul 12 16:10:46 localhost systemd: Started Stop Read-Ahead Data Collection.
            Jul 12 16:10:52 localhost journal: LSST Wrapper: new mode= 3.000000
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          367132
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          399842
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          429827
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          431242
            Jul 12 16:10:54 localhost journal: LSST Wrapper: max time in callback =          446342
            Jul 12 16:11:03 localhost journal: LSST Wrapper: max time in callback =          447674
            Jul 12 16:11:04 localhost journal: LSST Wrapper: max time in callback =          462521
            Jul 12 16:11:32 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000.
            Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:11:32 localhost journal: LSST Wrapper: max time in callback =          498838
            Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:11:46 localhost journal: LSST Wrapper: max time in callback =          506773
            Jul 12 16:13:56 localhost journal: LSST Wrapper: max time in callback =          592167
            Jul 12 16:17:46 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 12 16:17:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 6.000000.
            Jul 12 16:17:46 localhost journal: LSST Wrapper: STO interlock open
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            

            This means the hexapod was in the OfflineState (new mode = 3) first, received the state transition command (0x8000) of EnterControl ( parameter1 = 7), and transitioned to the StandbyState (new mode = 0). After this, there was the interlock error and transitioned to the FaultState (new mode = 4). Then, the ClearError command (parameter1 = 6) is issued. After this, the system began to oscillate between the new modes of StandbyState (0) and FaultState (4).

            This behavior is expected as long as the fault is persistent and the ClearError is triggered based on the Simulink state machine (see the transition between the StandbyState and FaultState):

            Show
            ttsai Te-Wei Tsai added a comment - - edited Checked the starting point of oscillation of states: Jul 12 16:10:46 localhost systemd: Starting Stop Read-Ahead Data Collection... Jul 12 16:10:46 localhost systemd: Started Stop Read-Ahead Data Collection. Jul 12 16:10:52 localhost journal: LSST Wrapper: new mode= 3.000000 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 367132 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 399842 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 429827 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 431242 Jul 12 16:10:54 localhost journal: LSST Wrapper: max time in callback = 446342 Jul 12 16:11:03 localhost journal: LSST Wrapper: max time in callback = 447674 Jul 12 16:11:04 localhost journal: LSST Wrapper: max time in callback = 462521 Jul 12 16:11:32 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000. Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:11:32 localhost journal: LSST Wrapper: max time in callback = 498838 Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:11:46 localhost journal: LSST Wrapper: max time in callback = 506773 Jul 12 16:13:56 localhost journal: LSST Wrapper: max time in callback = 592167 Jul 12 16:17:46 localhost journal: LSST Wrapper: Clearing drive faults. Jul 12 16:17:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 6.000000. Jul 12 16:17:46 localhost journal: LSST Wrapper: STO interlock open Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 This means the hexapod was in the OfflineState (new mode = 3) first, received the state transition command (0x8000) of EnterControl ( parameter1 = 7), and transitioned to the StandbyState (new mode = 0). After this, there was the interlock error and transitioned to the FaultState (new mode = 4). Then, the ClearError command (parameter1 = 6) is issued. After this, the system began to oscillate between the new modes of StandbyState (0) and FaultState (4). This behavior is expected as long as the fault is persistent and the ClearError is triggered based on the Simulink state machine (see the transition between the StandbyState and FaultState ):
            Hide
            ttsai Te-Wei Tsai added a comment -

            Tested the software on summit that the invalid command is executed:

            Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000.
            Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 6.000000.
            Jul 13 21:15:53 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 13 21:15:18 localhost journal: LSST Wrapper: max time in callback =          659115
            }
            

            The system can be enabled and the data in EFD:

            Show
            ttsai Te-Wei Tsai added a comment - Tested the software on summit that the invalid command is executed: Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000. Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 6.000000. Jul 13 21:15:53 localhost journal: LSST Wrapper: Clearing drive faults. Jul 13 21:15:18 localhost journal: LSST Wrapper: max time in callback = 659115 } The system can be enabled and the data in EFD:
            Hide
            ttsai Te-Wei Tsai added a comment -

            I realized that I may need to add the sleep time between two commands. Right now, I can stop the oscillation (the low-level controller will try to clear the error for 1 second):

            Jul 13 22:07:25 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000.
            (… oscillation)
            Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: STO interlock open
            Jul 13 22:07:24 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000.
            Jul 13 22:07:24 localhost journal: LSST Wrapper: Clearing drive faults
            

            Show
            ttsai Te-Wei Tsai added a comment - I realized that I may need to add the sleep time between two commands. Right now, I can stop the oscillation (the low-level controller will try to clear the error for 1 second): Jul 13 22:07:25 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000. (… oscillation) Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: STO interlock open Jul 13 22:07:24 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000. Jul 13 22:07:24 localhost journal: LSST Wrapper: Clearing drive faults
            Hide
            ttsai Te-Wei Tsai added a comment -

            Please help to review the PR:
            https://github.com/lsst-ts/ts_hexapod_controller/pull/20

            You had reviewed the commist: 51d3f58 to 1883af9.
            The commits: 89094b5 to ab00d5d are based on your previous review.
            The commit: b8e0d61 is the new one to solve the state machine problem of Simulink model.

            Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - Please help to review the PR: https://github.com/lsst-ts/ts_hexapod_controller/pull/20 You had reviewed the commist: 51d3f58 to 1883af9. The commits: 89094b5 to ab00d5d are based on your previous review. The commit: b8e0d61 is the new one to solve the state machine problem of Simulink model. Thanks!
            Hide
            ttsai Te-Wei Tsai added a comment -

            This code had been reviewed by Russell in GitHub. Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - This code had been reviewed by Russell in GitHub. Thanks!

              People

              Assignee:
              ttsai Te-Wei Tsai
              Reporter:
              rowen Russell Owen
              Reviewers:
              Russell Owen
              Watchers:
              Holger Drass, Petr Kubanek, Russell Owen, Te-Wei Tsai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Start date:
                End date:

                  Jenkins

                  No builds found.