ts_sal and ts_salobj both are designed to automatically acknowledge each command with code CMD_ACK=300. This occurs before the user even knows a command has been received.
This is in addition to the acknowledgements that the user has control over, such as CMD_INPROGRESS=301 (a slow command has started, and here is how long it might take) and the final codes that indicate the command has finished, such as CMD_COMPLETE=303 and CMD_FAILED=-302.
I don't think we are getting any value out of CMD_ACK. The issuer of a command can't do anything useful based on it. If the command will take a long time then the issuer should get CMD_INPROGRESS with an estimate of duration and then at some point a final code. Otherwise the issuer should quickly receive the final code. Sending and receiving CMD_ACK adds latency to commands and uses bandwidth for essentially no benefit.
I propose we update ts_sal and ts_salobj to stop returning CMD_ACK. But that we continue to accept and ignore it. That will allow a gradual transition.
Implementing this is simple and should be minimally disruptive. Users of ts_sal and ts_salobj should not notice the difference. Code that could be affected includes test code that explicitly checks for CMD_ACK. There are two such unit tests in ts_salobj (both of which would be easy to change), there may be a few tests in ts_sal and a few verification scripts in the wild that would also need updating.
Kian-Tat Lim That may well be the original reason for CMD_ACK, but I do not believe it is actually serving that purpose. salobj entirely ignores it, as does code based on ts_sal. I'm not sure what we could sensibly do with CMD_ACK. We could fail the command if it does not appear in "reasonable time", but we can do that just as well with the other acks: CSCs are expected to either quickly send ACK_INPROGRESS or a final ack very soon after the command is issued. It seems safest to wait for those before giving up.
In practice I think we typically detect problems in the network or with a specific CSC in other ways, two of which are: heartbeats stop showing up or our network monitoring system triggers.
I talked to Dave Mills today and indeed ts_sal-based code would have to be written very strangely to know or care that CMD_ACK is no longer arriving. We have very little non-Python code that issues commands. The pointing components are the main example I know of, and we will test that they ignore CMD_ACK.
So, we did used to rely on CMD_ACK as a two-way network connectivity indicator. But, as Russell pointed out, nowadays we have better ways to do it.
One nice thing about CMD_ACK is that it allow us to audit the response of a CSC, since it gives us a good indication of when a CSC started to respond to a command. That being said, we have had only some very limited cases where this was actually used so I am pretty confident we can live without it.
We do have a corner case where routing can go wonky. This happened working with the TTS for a while. The heartbeats of the camera CSCs were coming in just fine, but no commands would work. The only indication was the timeout waiting on the CMD_ACK. I'm OK with removing the CMD_ACK if the code is waiting on the other acks and will timeout appropriately if those aren't received. I assume this means waiting on CMD_INPROGRESS and CMD_COMPLETE since you aren't sure which one you will get first.
Yes. The time limit will still work as expected. Also, if using a salobj Remote and it sees CMD_INPROGRESS then it extends the timeout duration by the estimated duration specified in the CMD_INPROGRESS ack.
I thought the purpose was to be able to rapidly detect a networking or messaging layer problem or failure of the commanded system without waiting for a potentially longer timeout.