Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28687

The MTHexapod low-level controller mis-handles clearError if there is a fault condition

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Sprint:
      TSSW Sprint - Jun 21 - Jul 05, TSSW Sprint - Jul 05 - Jul 19
    • Team:
      Telescope and Site
    • Urgent?:
      No

      Description

      If there is a fault condition such as an e-stop pressed then the clearError command oscillates between the "FAULT" and "STANDBY" state forever. I think it should got to FAULT and stay in FAULT.

      I observed this in today's testing. I believe the sequence was:

      • Use the EUI to put the controller into DDS mode
      • Use the hexapod commander to issue "enterControl" to the CSC
      • The state goes to FAULT
      • Use the hexapod commander to issue "clearError" to the CSC
      • The state oscillates between FAULT and STANDBY. I see no sign that it will ever stop doing this.

      That's the core of the problem above. But we continued as follows:

      • Use the hexapod commander to issue "exitControl" to the CSC. The oscillation stopped.
      • Use the EUI to issue "clearError". The oscillation resumed (as reported to the commander).

        Attachments

        1. dataInEfd.png
          dataInEfd.png
          513 kB
        2. enabledStateWithInvalidCommand.png
          enabledStateWithInvalidCommand.png
          417 kB
        3. IMG_20210712_140632.jpg
          IMG_20210712_140632.jpg
          618 kB
        4. stateMachine.png
          stateMachine.png
          91 kB
        5. stateTransition.png
          stateTransition.png
          431 kB

          Issue Links

            Activity

            Hide
            ttsai Te-Wei Tsai added a comment - - edited

            Checked the starting point of oscillation of states:

            Jul 12 16:10:46 localhost systemd: Starting Stop Read-Ahead Data Collection...
            Jul 12 16:10:46 localhost systemd: Started Stop Read-Ahead Data Collection.
            Jul 12 16:10:52 localhost journal: LSST Wrapper: new mode= 3.000000
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          367132
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          399842
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          429827
            Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback =          431242
            Jul 12 16:10:54 localhost journal: LSST Wrapper: max time in callback =          446342
            Jul 12 16:11:03 localhost journal: LSST Wrapper: max time in callback =          447674
            Jul 12 16:11:04 localhost journal: LSST Wrapper: max time in callback =          462521
            Jul 12 16:11:32 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000.
            Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:11:32 localhost journal: LSST Wrapper: max time in callback =          498838
            Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:11:46 localhost journal: LSST Wrapper: max time in callback =          506773
            Jul 12 16:13:56 localhost journal: LSST Wrapper: max time in callback =          592167
            Jul 12 16:17:46 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 12 16:17:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 6.000000.
            Jul 12 16:17:46 localhost journal: LSST Wrapper: STO interlock open
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000
            

            This means the hexapod was in the OfflineState (new mode = 3) first, received the state transition command (0x8000) of EnterControl ( parameter1 = 7), and transitioned to the StandbyState (new mode = 0). After this, there was the interlock error and transitioned to the FaultState (new mode = 4). Then, the ClearError command (parameter1 = 6) is issued. After this, the system began to oscillate between the new modes of StandbyState (0) and FaultState (4).

            This behavior is expected as long as the fault is persistent and the ClearError is triggered based on the Simulink state machine (see the transition between the StandbyState and FaultState):

            Show
            ttsai Te-Wei Tsai added a comment - - edited Checked the starting point of oscillation of states: Jul 12 16:10:46 localhost systemd: Starting Stop Read-Ahead Data Collection... Jul 12 16:10:46 localhost systemd: Started Stop Read-Ahead Data Collection. Jul 12 16:10:52 localhost journal: LSST Wrapper: new mode= 3.000000 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 367132 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 399842 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 429827 Jul 12 16:10:52 localhost journal: LSST Wrapper: max time in callback = 431242 Jul 12 16:10:54 localhost journal: LSST Wrapper: max time in callback = 446342 Jul 12 16:11:03 localhost journal: LSST Wrapper: max time in callback = 447674 Jul 12 16:11:04 localhost journal: LSST Wrapper: max time in callback = 462521 Jul 12 16:11:32 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 7.000000. Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:11:32 localhost journal: LSST Wrapper: max time in callback = 498838 Jul 12 16:11:32 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:11:46 localhost journal: LSST Wrapper: max time in callback = 506773 Jul 12 16:13:56 localhost journal: LSST Wrapper: max time in callback = 592167 Jul 12 16:17:46 localhost journal: LSST Wrapper: Clearing drive faults. Jul 12 16:17:46 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=3, parameter1 = 6.000000. Jul 12 16:17:46 localhost journal: LSST Wrapper: STO interlock open Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 12 16:17:46 localhost journal: LSST Wrapper: new mode= 4.000000 This means the hexapod was in the OfflineState (new mode = 3) first, received the state transition command (0x8000) of EnterControl ( parameter1 = 7), and transitioned to the StandbyState (new mode = 0). After this, there was the interlock error and transitioned to the FaultState (new mode = 4). Then, the ClearError command (parameter1 = 6) is issued. After this, the system began to oscillate between the new modes of StandbyState (0) and FaultState (4). This behavior is expected as long as the fault is persistent and the ClearError is triggered based on the Simulink state machine (see the transition between the StandbyState and FaultState ):
            Hide
            ttsai Te-Wei Tsai added a comment -

            Tested the software on summit that the invalid command is executed:

            Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000.
            Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 6.000000.
            Jul 13 21:15:53 localhost journal: LSST Wrapper: Clearing drive faults.
            Jul 13 21:15:18 localhost journal: LSST Wrapper: max time in callback =          659115
            }
            

            The system can be enabled and the data in EFD:

            Show
            ttsai Te-Wei Tsai added a comment - Tested the software on summit that the invalid command is executed: Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000. Jul 13 21:15:53 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=2, parameter1 = 6.000000. Jul 13 21:15:53 localhost journal: LSST Wrapper: Clearing drive faults. Jul 13 21:15:18 localhost journal: LSST Wrapper: max time in callback = 659115 } The system can be enabled and the data in EFD:
            Hide
            ttsai Te-Wei Tsai added a comment -

            I realized that I may need to add the sleep time between two commands. Right now, I can stop the oscillation (the low-level controller will try to clear the error for 1 second):

            Jul 13 22:07:25 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000.
            (… oscillation)
            Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 0.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000
            Jul 13 22:07:24 localhost journal: LSST Wrapper: STO interlock open
            Jul 13 22:07:24 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000.
            Jul 13 22:07:24 localhost journal: LSST Wrapper: Clearing drive faults
            

            Show
            ttsai Te-Wei Tsai added a comment - I realized that I may need to add the sleep time between two commands. Right now, I can stop the oscillation (the low-level controller will try to clear the error for 1 second): Jul 13 22:07:25 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=0, parameter1 = 0.000000. (… oscillation) Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:25 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 0.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: new mode= 4.000000 Jul 13 22:07:24 localhost journal: LSST Wrapper: STO interlock open Jul 13 22:07:24 localhost journal: LSST Wrapper: processed State Cmd: 0x5555, 0x8000, counter=4, parameter1 = 6.000000. Jul 13 22:07:24 localhost journal: LSST Wrapper: Clearing drive faults
            Hide
            ttsai Te-Wei Tsai added a comment -

            Please help to review the PR:
            https://github.com/lsst-ts/ts_hexapod_controller/pull/20

            You had reviewed the commist: 51d3f58 to 1883af9.
            The commits: 89094b5 to ab00d5d are based on your previous review.
            The commit: b8e0d61 is the new one to solve the state machine problem of Simulink model.

            Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - Please help to review the PR: https://github.com/lsst-ts/ts_hexapod_controller/pull/20 You had reviewed the commist: 51d3f58 to 1883af9. The commits: 89094b5 to ab00d5d are based on your previous review. The commit: b8e0d61 is the new one to solve the state machine problem of Simulink model. Thanks!
            Hide
            ttsai Te-Wei Tsai added a comment -

            This code had been reviewed by Russell in GitHub. Thanks!

            Show
            ttsai Te-Wei Tsai added a comment - This code had been reviewed by Russell in GitHub. Thanks!

              People

              Assignee:
              ttsai Te-Wei Tsai
              Reporter:
              rowen Russell Owen
              Reviewers:
              Russell Owen
              Watchers:
              Holger Drass, Petr Kubanek, Russell Owen, Te-Wei Tsai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Start date:
                End date:

                  Jenkins

                  No builds found.