# MTHexapod reports failure in state transition when it is actually succeeding

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
• Story Points:
1
• Sprint:
TSSW Sprint - Sep 13 - Sep 27, TSSW Sprint - Sep 27 - Oct 11
• Team:
Telescope and Site
• Urgent?:
No

#### Description

I am not sure if this is an issue with the CSC or with the low level controller but we are constantly getting failures in state transition with the MTHexapod component when operating with the real hardware.

Assuming the system is in STANDBY state

A simple;

 import salobj    r = salobj.Remote(salobj.Domain(), "MTHexapod", index=1)   await r.start_task   await salobj.set_summary_state(r, salobj.State.DISABLED, settingsToApply="default")

results in;

 RuntimeError: Error on cmd=cmd_start, initial_state=5: msg='Command failed', ackcmd=(ackcmd private_seqNum=1948430428, ack=, error=1, result='Failed: Failed: final state is instead of ')

most of the time, though it works some times. Despite the failure reported above the CSC does transition to DISABLED state shortly after.

I wonder if the CSC should allow a bit more time for the state transition to occur, or if the low level controller is reporting the command as completed too early.

#### Activity

Hide
Russell Owen added a comment - - edited

I think this is in the CSC. Let me give a bit of background: the low-level controller does not report command success or failure, so the CSC has to guess based on the data that the low-level controller does send. I fervently hope we Te-Wei Tsai can fix this someday. Meanwhile we are stuck with it and it leads to issues such as this.

I looked at the CSC code that handles the state transition commands and its guessing is too naive. The current code issues the state transition command then then waits for 2 telemetry messages from the low-level controller, checks the controller state, and fails the command if it's not the desired new state. A more robust algorithm is to check the next "up to N" telemetry samples, waiting for the new state.

Te-Wei Tsai is there some way to predict a minimum time for the low-level controller to respond to a request for state change? I could use that information to pick a suitable maximum number of telemetry samples.

Show
Russell Owen added a comment - - edited I think this is in the CSC. Let me give a bit of background: the low-level controller does not report command success or failure, so the CSC has to guess based on the data that the low-level controller does send. I fervently hope we Te-Wei Tsai can fix this someday. Meanwhile we are stuck with it and it leads to issues such as this. I looked at the CSC code that handles the state transition commands and its guessing is too naive. The current code issues the state transition command then then waits for 2 telemetry messages from the low-level controller, checks the controller state, and fails the command if it's not the desired new state. A more robust algorithm is to check the next "up to N" telemetry samples, waiting for the new state. Te-Wei Tsai is there some way to predict a minimum time for the low-level controller to respond to a request for state change? I could use that information to pick a suitable maximum number of telemetry samples.
Hide
Tiago Ribeiro added a comment -

Sounds good! I figured it was something on those lines. When you get an appropriate number of times, I imagine you can convert that into a timeout in seconds, right? Can you also report that in the “ack in progress”?

Show
Tiago Ribeiro added a comment - Sounds good! I figured it was something on those lines. When you get an appropriate number of times, I imagine you can convert that into a timeout in seconds, right? Can you also report that in the “ack in progress”?
Hide
Te-Wei Tsai added a comment -

This is related to DM-29578. The telemetry frequency is ~20 Hz. If there is a state change, it will reflect in State and EnabledSubState:

  // Get state information  tlmStruct->State = GUItlmStruct->State;  tlmStruct->EnabledSubState = GUItlmStruct->EnabledSubState;  tlmStruct->OfflineSubState = GUItlmStruct->OfflineSubState;  tlmStruct->TestState = GUItlmStruct->TestState; 

https://github.com/lsst-ts/ts_hexapod_controller/blob/develop/src/actuatorTlm.c#L693-L697

I think wait for >= 0.5 second is reasonable but I might be wrong. Thanks!

Show
Te-Wei Tsai added a comment - This is related to DM-29578 . The telemetry frequency is ~20 Hz. If there is a state change, it will reflect in State and EnabledSubState : // Get state information tlmStruct->State = GUItlmStruct->State; tlmStruct->EnabledSubState = GUItlmStruct->EnabledSubState; tlmStruct->OfflineSubState = GUItlmStruct->OfflineSubState; tlmStruct->TestState = GUItlmStruct->TestState; https://github.com/lsst-ts/ts_hexapod_controller/blob/develop/src/actuatorTlm.c#L693-L697 I think wait for >= 0.5 second is reasonable but I might be wrong. Thanks!
Hide
Russell Owen added a comment -

This affects both the MT hexapod and MT rotator.

Show
Russell Owen added a comment - This affects both the MT hexapod and MT rotator.
Hide
Russell Owen added a comment - - edited

The issue affects both in MTHexapod and MTRotator.

The fix is in in BaseCsc in ts_hexrotcomm. However, I took the liberty of simplifying assert_summary_state, deprecating an argument used to ts_mtrotator, so I also have a trivial patch for that package.

• Update to use ts_utils.
• Fix cleanup in a unit test file.

Pull requests:

Show
Russell Owen added a comment - - edited The issue affects both in MTHexapod and MTRotator. The fix is in in BaseCsc in ts_hexrotcomm. However, I took the liberty of simplifying assert_summary_state, deprecating an argument used to ts_mtrotator, so I also have a trivial patch for that package. Additional changes to ts_hexrotcomm: Update to use ts_utils. Fix cleanup in a unit test file. Pull requests: https://github.com/lsst-ts/ts_hexrotcomm/pull/42 https://github.com/lsst-ts/ts_mtrotator/pull/51
Hide
Tiago Ribeiro added a comment -

reviewed in GitHub...

Show
Tiago Ribeiro added a comment - reviewed in GitHub...
Hide
Russell Owen added a comment -

Released:

• ts_hexrotcomm v0.20.0
• ts_mtrotator v0.18.0. This requires ts_hexrotcomm v0.20.0, but is not requires in order to get the fix (i.e. one can use v0.17.0 if desired).
Show
Russell Owen added a comment - Released: ts_hexrotcomm v0.20.0 ts_mtrotator v0.18.0. This requires ts_hexrotcomm v0.20.0, but is not requires in order to get the fix (i.e. one can use v0.17.0 if desired).

#### People

Assignee:
Russell Owen
Reporter:
Tiago Ribeiro
Reviewers:
Tiago Ribeiro
Watchers:
Andy Clements, Holger Drass, Russell Owen, Sandrine Thomas, Te-Wei Tsai, Tiago Ribeiro
0 Vote for this issue
Watchers:
6 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Jenkins

No builds found.