Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Story Points:1
-
Epic Link:
-
Sprint:TSSW Sprint - Sep 13 - Sep 27, TSSW Sprint - Sep 27 - Oct 11
-
Team:Telescope and Site
-
Urgent?:No
Description
I am not sure if this is an issue with the CSC or with the low level controller but we are constantly getting failures in state transition with the MTHexapod component when operating with the real hardware.
Assuming the system is in STANDBY state
A simple;
import salobj
|
|
r = salobj.Remote(salobj.Domain(), "MTHexapod", index=1)
|
|
await r.start_task
|
|
await salobj.set_summary_state(r, salobj.State.DISABLED, settingsToApply="default")
|
results in;
RuntimeError: Error on cmd=cmd_start, initial_state=5: msg='Command failed', ackcmd=(ackcmd private_seqNum=1948430428, ack=<SalRetCode.CMD_FAILED: -302>, error=1, result='Failed: Failed: final state is <State.STANDBY: 5> instead of <State.DISABLED: 1>')
|
most of the time, though it works some times. Despite the failure reported above the CSC does transition to DISABLED state shortly after.
I wonder if the CSC should allow a bit more time for the state transition to occur, or if the low level controller is reporting the command as completed too early.
Attachments
Issue Links
- relates to
-
DM-31244 Hexapod state transition from EnabledState to DisabledState is sometimes rejected
- Won't Fix
-
DM-31075 Camera hexapod state machine does not reject enable command in enabledState
- Invalid
-
DM-29578 Please improve TCP/IP communications for the MT rotator and hexapod low-level controllers
- Done
I think this is in the CSC. Let me give a bit of background: the low-level controller does not report command success or failure, so the CSC has to guess based on the data that the low-level controller does send. I fervently hope we Te-Wei Tsai can fix this someday. Meanwhile we are stuck with it and it leads to issues such as this.
I looked at the CSC code that handles the state transition commands and its guessing is too naive. The current code issues the state transition command then then waits for 2 telemetry messages from the low-level controller, checks the controller state, and fails the command if it's not the desired new state. A more robust algorithm is to check the next "up to N" telemetry samples, waiting for the new state.
Te-Wei Tsai is there some way to predict a minimum time for the low-level controller to respond to a request for state change? I could use that information to pick a suitable maximum number of telemetry samples.