Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-831

Stop returning CMD_ACK command acknowledgements

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: T&S
    • Labels:
      None
    • Location:
      This ticket

      Description

      ts_sal and ts_salobj both are designed to automatically acknowledge each command with code CMD_ACK=300. This occurs before the user even knows a command has been received.

      This is in addition to the acknowledgements that the user has control over, such as CMD_INPROGRESS=301 (a slow command has started, and here is how long it might take) and the final codes that indicate the command has finished, such as CMD_COMPLETE=303 and CMD_FAILED=-302.

      I don't think we are getting any value out of CMD_ACK. The issuer of a command can't do anything useful based on it. If the command will take a long time then the issuer should get CMD_INPROGRESS with an estimate of duration and then at some point a final code. Otherwise the issuer should quickly receive the final code. Sending and receiving CMD_ACK adds latency to commands and uses bandwidth for essentially no benefit.

      I propose we update ts_sal and ts_salobj to stop returning CMD_ACK. But that we continue to accept and ignore it. That will allow a gradual transition.

      Implementing this is simple and should be minimally disruptive. Users of ts_sal and ts_salobj should not notice the difference. Code that could be affected includes test code that explicitly checks for CMD_ACK. There are two such unit tests in ts_salobj (both of which would be easy to change), there may be a few tests in ts_sal and a few verification scripts in the wild that would also need updating.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I thought the purpose was to be able to rapidly detect a networking or messaging layer problem or failure of the commanded system without waiting for a potentially longer timeout.

            Show
            ktl Kian-Tat Lim added a comment - I thought the purpose was to be able to rapidly detect a networking or messaging layer problem or failure of the commanded system without waiting for a potentially longer timeout.
            Hide
            rowen Russell Owen added a comment - - edited

            Kian-Tat Lim That may well be the original reason for CMD_ACK, but I do not believe it is actually serving that purpose. salobj entirely ignores it, as does code based on ts_sal. I'm not sure what we could sensibly do with CMD_ACK. We could fail the command if it does not appear in "reasonable time", but we can do that just as well with the other acks: CSCs are expected to either quickly send ACK_INPROGRESS or a final ack very soon after the command is issued. It seems safest to wait for those before giving up.

            In practice I think we typically detect problems in the network or with a specific CSC in other ways, two of which are: heartbeats stop showing up or our network monitoring system triggers.

            I talked to Dave Mills today and indeed ts_sal-based code would have to be written very strangely to know or care that CMD_ACK is no longer arriving. We have very little non-Python code that issues commands. The pointing components are the main example I know of, and we will test that they ignore CMD_ACK.

            Show
            rowen Russell Owen added a comment - - edited Kian-Tat Lim That may well be the original reason for CMD_ACK, but I do not believe it is actually serving that purpose. salobj entirely ignores it, as does code based on ts_sal. I'm not sure what we could sensibly do with CMD_ACK. We could fail the command if it does not appear in "reasonable time", but we can do that just as well with the other acks: CSCs are expected to either quickly send ACK_INPROGRESS or a final ack very soon after the command is issued. It seems safest to wait for those before giving up. In practice I think we typically detect problems in the network or with a specific CSC in other ways, two of which are: heartbeats stop showing up or our network monitoring system triggers. I talked to Dave Mills today and indeed ts_sal-based code would have to be written very strangely to know or care that CMD_ACK is no longer arriving. We have very little non-Python code that issues commands. The pointing components are the main example I know of, and we will test that they ignore CMD_ACK.
            Hide
            tribeiro Tiago Ribeiro added a comment -

            So, we did used to rely on CMD_ACK as a two-way network connectivity indicator. But, as Russell pointed out, nowadays we have better ways to do it.

            One nice thing about CMD_ACK is that it allow us to audit the response of a CSC, since it gives us a good indication of when a CSC started to respond to a command. That being said, we have had only some very limited cases where this was actually used so I am pretty confident we can live without it.

            Show
            tribeiro Tiago Ribeiro added a comment - So, we did used to rely on CMD_ACK as a two-way network connectivity indicator. But, as Russell pointed out, nowadays we have better ways to do it. One nice thing about CMD_ACK is that it allow us to audit the response of a CSC, since it gives us a good indication of when a CSC started to respond to a command. That being said, we have had only some very limited cases where this was actually used so I am pretty confident we can live without it.
            Hide
            mareuter Michael Reuter added a comment -

            We do have a corner case where routing can go wonky. This happened working with the TTS for a while. The heartbeats of the camera CSCs were coming in just fine, but no commands would work. The only indication was the timeout waiting on the CMD_ACK. I'm OK with removing the CMD_ACK if the code is waiting on the other acks and will timeout appropriately if those aren't received. I assume this means waiting on CMD_INPROGRESS and CMD_COMPLETE since you aren't sure which one you will get first.

            Show
            mareuter Michael Reuter added a comment - We do have a corner case where routing can go wonky. This happened working with the TTS for a while. The heartbeats of the camera CSCs were coming in just fine, but no commands would work. The only indication was the timeout waiting on the CMD_ACK. I'm OK with removing the CMD_ACK if the code is waiting on the other acks and will timeout appropriately if those aren't received. I assume this means waiting on CMD_INPROGRESS and CMD_COMPLETE since you aren't sure which one you will get first.
            Hide
            rowen Russell Owen added a comment -

            Yes. The time limit will still work as expected. Also, if using a salobj Remote and it sees CMD_INPROGRESS then it extends the timeout duration by the estimated duration specified in the CMD_INPROGRESS ack.

            Show
            rowen Russell Owen added a comment - Yes. The time limit will still work as expected. Also, if using a salobj Remote and it sees CMD_INPROGRESS then it extends the timeout duration by the estimated duration specified in the CMD_INPROGRESS ack.
            Hide
            rowen Russell Owen added a comment -

            Adopted as stated.

            Show
            rowen Russell Owen added a comment - Adopted as stated.

              People

              Assignee:
              rowen Russell Owen
              Reporter:
              rowen Russell Owen
              Watchers:
              Dave Mills, Kian-Tat Lim, Michael Reuter, Russell Owen, Tiago Ribeiro, Tony Johnson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.