Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33788

Make sure that the ESS CSC and Controller truly can be stopped and restarted

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Sprint:
      TSSW Sprint - Feb 14 - Feb 28, TSSW Sprint - Feb 28 - Mar 14
    • Team:
      Telescope and Site
    • Urgent?:
      No

      Description

      In DM-31994 the CSC and Controller were modified such that a CSC can disconnect from the controller and then reconnect again. This doesn't seem to work properly and this ticket is for investigating the cause and fixing it.

        Attachments

          Issue Links

            Activity

            Hide
            wvreeven Wouter van Reeven added a comment - - edited

            This is a timing issue. I managed to put the AuxTel ESS CSC from ENABLED to DISABLED today without generating the error. However, when I tried again, it happened again. The output of the ESS Controller when setting the CSC from ENABLED to DISABLED the second time was:

            2022-02-24 13:11:15,097:DEBUG:CommandHandler.RpiSerialHat:Reading data.
            2022-02-24 13:11:15,954:DEBUG:CommandHandler.TemperatureSensor:extract_telemetry
            2022-02-24 13:11:15,954:INFO:CommandHandler.VcpFtdi:Returning {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS01', <Key.TIMESTAMP: 'timestamp'>: 1645708311.6267428, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [nan, 14.4319, 14.0645, 14.5886]}}
            2022-02-24 13:11:15,955:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS01', <Key.TIMESTAMP: 'timestamp'>: 1645708311.6267428, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [nan, 14.4319, 14.0645, 14.5886]}}
            2022-02-24 13:11:15,955:DEBUG:SocketServer.OneClientServer(EssSensorsServer):{"telemetry": {"name": "AuxTel-ESS01", "timestamp": 1645708311.6267428, "response_code": 0, "sensor_telemetry": [NaN, 14.4319, 14.0645, 14.5886]}}
            2022-02-24 13:11:15,955:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done
            2022-02-24 13:11:15,956:DEBUG:CommandHandler.VcpFtdi:Reading data.
            2022-02-24 13:11:16,602:DEBUG:CommandHandler.Hx85baSensor:extract_telemetry
            2022-02-24 13:11:16,603:INFO:CommandHandler.RpiSerialHat:Returning {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS02', <Key.TIMESTAMP: 'timestamp'>: 1645708312.0981436, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [18.44, 14.49, 740.7, -9.283921902332356]}}
            2022-02-24 13:11:16,603:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS02', <Key.TIMESTAMP: 'timestamp'>: 1645708312.0981436, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [18.44, 14.49, 740.7, -9.283921902332356]}}
            2022-02-24 13:11:16,604:DEBUG:SocketServer.OneClientServer(EssSensorsServer):{"telemetry": {"name": "AuxTel-ESS02", "timestamp": 1645708312.0981436, "response_code": 0, "sensor_telemetry": [18.44, 14.49, 740.7, -9.283921902332356]}}
            2022-02-24 13:11:16,604:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done
            2022-02-24 13:11:16,605:DEBUG:CommandHandler.RpiSerialHat:Reading data.
            2022-02-24 13:11:17,292:DEBUG:CommandHandler.TemperatureSensor:extract_telemetry
            2022-02-24 13:11:17,292:INFO:CommandHandler.VcpFtdi:Returning {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS01', <Key.TIMESTAMP: 'timestamp'>: 1645708312.9564328, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [nan, 14.4312, 14.0642, 14.5876]}}
            2022-02-24 13:11:17,293:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry'>: {<Key.NAME: 'name'>: 'AuxTel-ESS01', <Key.TIMESTAMP: 'timestamp'>: 1645708312.9564328, <Key.RESPONSE_CODE: 'response_code'>: <ResponseCode.OK: 0>, <Key.SENSOR_TELEMETRY: 'sensor_telemetry'>: [nan, 14.4312, 14.0642, 14.5876]}}
            2022-02-24 13:11:17,293:DEBUG:SocketServer.OneClientServer(EssSensorsServer):{"telemetry": {"name": "AuxTel-ESS01", "timestamp": 1645708312.9564328, "response_code": 0, "sensor_telemetry": [NaN, 14.4312, 14.0642, 14.5876]}}
            2022-02-24 13:11:17,294:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done
            2022-02-24 13:11:17,294:DEBUG:CommandHandler.VcpFtdi:Reading data.
            2022-02-24 13:11:17,730:ERROR:SocketServer.OneClientServer(EssSensorsServer):read_loop failed. Disconnecting.
            Traceback (most recent call last):
              File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/socket_server.py", line 139, in read_loop
                line = await self.reader.readuntil(tcpip.TERMINATOR)
              File "/home/saluser/miniforge/lib/python3.8/asyncio/streams.py", line 629, in readuntil
                raise exceptions.IncompleteReadError(chunk, None)
            asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of undefined expected bytes
            2022-02-24 13:11:17,730:INFO:CommandHandler:stop_sending_telemetry
            2022-02-24 13:11:17,731:DEBUG:CommandHandler:Closing <lsst.ts.ess.controller.device.rpi_serial_hat.RpiSerialHat object at 0x7f709b9fd0> device with name AuxTel-ESS02
            2022-02-24 13:11:17,731:DEBUG:CommandHandler.RpiSerialHat:Stopping read loop for 'AuxTel-ESS02' sensor.
            2022-02-24 13:11:17,732:ERROR:CommandHandler.RpiSerialHat:Serial port closed.
            Traceback (most recent call last):
              File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/socket_server.py", line 139, in read_loop
                line = await self.reader.readuntil(tcpip.TERMINATOR)
              File "/home/saluser/miniforge/lib/python3.8/asyncio/streams.py", line 629, in readuntil
                raise exceptions.IncompleteReadError(chunk, None)
            asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of undefined expected bytes
            2022-02-24 13:11:17,732:DEBUG:CommandHandler:Closing <lsst.ts.ess.controller.device.vcp_ftdi.VcpFtdi object at 0x7f70974e20> device with name AuxTel-ESS01
            2022-02-24 13:11:17,733:DEBUG:CommandHandler.VcpFtdi:Stopping read loop for 'AuxTel-ESS01' sensor.
            2022-02-24 13:11:17,734:DEBUG:CommandHandler.VcpFtdi:FTDI device closed.
            2022-02-24 13:11:17,734:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Cancelling read_loop_task.
            2022-02-24 13:11:17,734:DEBUG:SocketServer.OneClientServer(EssSensorsServer):Closing client.
            2022-02-24 13:11:17,735:INFO:SocketServer.OneClientServer(EssSensorsServer):Closing the client socket.
            

            When pressing CTRL-C in the terminal running the ESS Controller, I got:

            ^C2022-02-24 13:11:56,088:DEBUG:SocketServer.OneClientServer(EssSensorsServer):call_connect_callback: connected=False; last_connected=True
            2022-02-24 13:11:56,089:INFO:SocketServer.OneClientServer(EssSensorsServer):Calling connect_callback
            2022-02-24 13:11:56,089:INFO:SocketServer.OneClientServer(EssSensorsServer):Client disconnected.
            

            Pressing CTRL-C again to kill the ESS Controller produced:

            ^CError in atexit._run_exitfuncs:
            Traceback (most recent call last):
              File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit
                t.join()
              File "/home/saluser/miniforge/lib/python3.8/threading.py", line 1011, in join
                self._wait_for_tstate_lock()
              File "/home/saluser/miniforge/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
                elif lock.acquire(block, timeout):
            KeyboardInterrupt
            2022-02-24 13:12:01,419:ERROR:asyncio:Task exception was never retrieved
            future: <Task finished name='Task-13' coro=<BaseDevice._run() done, defined at /home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/device/base_device.py:118> exception=KeyboardInterrupt()>
            Traceback (most recent call last):
              File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/controller/device/rpi_serial_hat.py", line 107, in readline
                buffer.write(await loop.run_in_executor(pool, self.ser.read, 1))
            asyncio.exceptions.CancelledErrorDuring handling of the above exception, another exception occurred:Traceback (most recent call last):
              File "/home/saluser/miniforge/bin/run_ess_controller.py", line 63, in <module>
                asyncio.run(main())
              File "/home/saluser/miniforge/lib/python3.8/asyncio/runners.py", line 44, in run
                return loop.run_until_complete(main)
              File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
                self.run_forever()
              File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
                self._run_once()
              File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
                handle._run()
              File "/home/saluser/miniforge/lib/python3.8/asyncio/events.py", line 81, in _run
                self._context.run(self._callback, *self._args)
              File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/device/base_device.py", line 129, in _run
                line = await self.readline()
              File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/controller/device/rpi_serial_hat.py", line 107, in readline
                buffer.write(await loop.run_in_executor(pool, self.ser.read, 1))
              File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/_base.py", line 644, in __exit__
                self.shutdown(wait=True)
              File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/thread.py", line 236, in shutdown
                t.join()
              File "/home/saluser/miniforge/lib/python3.8/threading.py", line 1011, in join
                self._wait_for_tstate_lock()
              File "/home/saluser/miniforge/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
                elif lock.acquire(block, timeout):
            KeyboardInterrupt
            

            Show
            wvreeven Wouter van Reeven added a comment - - edited This is a timing issue. I managed to put the AuxTel ESS CSC from ENABLED to DISABLED today without generating the error. However, when I tried again, it happened again. The output of the ESS Controller when setting the CSC from ENABLED to DISABLED the second time was: 2022 - 02 - 24 13 : 11 : 15 , 097 :DEBUG:CommandHandler.RpiSerialHat:Reading data. 2022 - 02 - 24 13 : 11 : 15 , 954 :DEBUG:CommandHandler.TemperatureSensor:extract_telemetry 2022 - 02 - 24 13 : 11 : 15 , 954 :INFO:CommandHandler.VcpFtdi:Returning {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS01' , <Key.TIMESTAMP: 'timestamp' >: 1645708311.6267428 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [nan, 14.4319 , 14.0645 , 14.5886 ]}} 2022 - 02 - 24 13 : 11 : 15 , 955 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS01' , <Key.TIMESTAMP: 'timestamp' >: 1645708311.6267428 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [nan, 14.4319 , 14.0645 , 14.5886 ]}} 2022 - 02 - 24 13 : 11 : 15 , 955 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):{ "telemetry" : { "name" : "AuxTel-ESS01" , "timestamp" : 1645708311.6267428 , "response_code" : 0 , "sensor_telemetry" : [NaN, 14.4319 , 14.0645 , 14.5886 ]}} 2022 - 02 - 24 13 : 11 : 15 , 955 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done 2022 - 02 - 24 13 : 11 : 15 , 956 :DEBUG:CommandHandler.VcpFtdi:Reading data. 2022 - 02 - 24 13 : 11 : 16 , 602 :DEBUG:CommandHandler.Hx85baSensor:extract_telemetry 2022 - 02 - 24 13 : 11 : 16 , 603 :INFO:CommandHandler.RpiSerialHat:Returning {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS02' , <Key.TIMESTAMP: 'timestamp' >: 1645708312.0981436 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [ 18.44 , 14.49 , 740.7 , - 9.283921902332356 ]}} 2022 - 02 - 24 13 : 11 : 16 , 603 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS02' , <Key.TIMESTAMP: 'timestamp' >: 1645708312.0981436 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [ 18.44 , 14.49 , 740.7 , - 9.283921902332356 ]}} 2022 - 02 - 24 13 : 11 : 16 , 604 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):{ "telemetry" : { "name" : "AuxTel-ESS02" , "timestamp" : 1645708312.0981436 , "response_code" : 0 , "sensor_telemetry" : [ 18.44 , 14.49 , 740.7 , - 9.283921902332356 ]}} 2022 - 02 - 24 13 : 11 : 16 , 604 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done 2022 - 02 - 24 13 : 11 : 16 , 605 :DEBUG:CommandHandler.RpiSerialHat:Reading data. 2022 - 02 - 24 13 : 11 : 17 , 292 :DEBUG:CommandHandler.TemperatureSensor:extract_telemetry 2022 - 02 - 24 13 : 11 : 17 , 292 :INFO:CommandHandler.VcpFtdi:Returning {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS01' , <Key.TIMESTAMP: 'timestamp' >: 1645708312.9564328 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [nan, 14.4312 , 14.0642 , 14.5876 ]}} 2022 - 02 - 24 13 : 11 : 17 , 293 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Writing data {<Key.TELEMETRY: 'telemetry' >: {<Key.NAME: 'name' >: 'AuxTel-ESS01' , <Key.TIMESTAMP: 'timestamp' >: 1645708312.9564328 , <Key.RESPONSE_CODE: 'response_code' >: <ResponseCode.OK: 0 >, <Key.SENSOR_TELEMETRY: 'sensor_telemetry' >: [nan, 14.4312 , 14.0642 , 14.5876 ]}} 2022 - 02 - 24 13 : 11 : 17 , 293 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):{ "telemetry" : { "name" : "AuxTel-ESS01" , "timestamp" : 1645708312.9564328 , "response_code" : 0 , "sensor_telemetry" : [NaN, 14.4312 , 14.0642 , 14.5876 ]}} 2022 - 02 - 24 13 : 11 : 17 , 294 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Done 2022 - 02 - 24 13 : 11 : 17 , 294 :DEBUG:CommandHandler.VcpFtdi:Reading data. 2022 - 02 - 24 13 : 11 : 17 , 730 :ERROR:SocketServer.OneClientServer(EssSensorsServer):read_loop failed. Disconnecting. Traceback (most recent call last): File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/socket_server.py" , line 139 , in read_loop line = await self.reader.readuntil(tcpip.TERMINATOR) File "/home/saluser/miniforge/lib/python3.8/asyncio/streams.py" , line 629 , in readuntil raise exceptions.IncompleteReadError(chunk, None) asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of undefined expected bytes 2022 - 02 - 24 13 : 11 : 17 , 730 :INFO:CommandHandler:stop_sending_telemetry 2022 - 02 - 24 13 : 11 : 17 , 731 :DEBUG:CommandHandler:Closing <lsst.ts.ess.controller.device.rpi_serial_hat.RpiSerialHat object at 0x7f709b9fd0 > device with name AuxTel-ESS02 2022 - 02 - 24 13 : 11 : 17 , 731 :DEBUG:CommandHandler.RpiSerialHat:Stopping read loop for 'AuxTel-ESS02' sensor. 2022 - 02 - 24 13 : 11 : 17 , 732 :ERROR:CommandHandler.RpiSerialHat:Serial port closed. Traceback (most recent call last): File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/socket_server.py" , line 139 , in read_loop line = await self.reader.readuntil(tcpip.TERMINATOR) File "/home/saluser/miniforge/lib/python3.8/asyncio/streams.py" , line 629 , in readuntil raise exceptions.IncompleteReadError(chunk, None) asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of undefined expected bytes 2022 - 02 - 24 13 : 11 : 17 , 732 :DEBUG:CommandHandler:Closing <lsst.ts.ess.controller.device.vcp_ftdi.VcpFtdi object at 0x7f70974e20 > device with name AuxTel-ESS01 2022 - 02 - 24 13 : 11 : 17 , 733 :DEBUG:CommandHandler.VcpFtdi:Stopping read loop for 'AuxTel-ESS01' sensor. 2022 - 02 - 24 13 : 11 : 17 , 734 :DEBUG:CommandHandler.VcpFtdi:FTDI device closed. 2022 - 02 - 24 13 : 11 : 17 , 734 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Cancelling read_loop_task. 2022 - 02 - 24 13 : 11 : 17 , 734 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):Closing client. 2022 - 02 - 24 13 : 11 : 17 , 735 :INFO:SocketServer.OneClientServer(EssSensorsServer):Closing the client socket. When pressing CTRL-C in the terminal running the ESS Controller, I got: ^C2022- 02 - 24 13 : 11 : 56 , 088 :DEBUG:SocketServer.OneClientServer(EssSensorsServer):call_connect_callback: connected=False; last_connected=True 2022 - 02 - 24 13 : 11 : 56 , 089 :INFO:SocketServer.OneClientServer(EssSensorsServer):Calling connect_callback 2022 - 02 - 24 13 : 11 : 56 , 089 :INFO:SocketServer.OneClientServer(EssSensorsServer):Client disconnected. Pressing CTRL-C again to kill the ESS Controller produced: ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/thread.py" , line 40 , in _python_exit t.join() File "/home/saluser/miniforge/lib/python3.8/threading.py" , line 1011 , in join self._wait_for_tstate_lock() File "/home/saluser/miniforge/lib/python3.8/threading.py" , line 1027 , in _wait_for_tstate_lock elif lock.acquire(block, timeout): KeyboardInterrupt 2022 - 02 - 24 13 : 12 : 01 , 419 :ERROR:asyncio:Task exception was never retrieved future: <Task finished name= 'Task-13' coro=<BaseDevice._run() done, defined at /home/saluser/miniforge/lib/python3. 8 /site-packages/lsst/ts/ess/common/device/base_device.py: 118 > exception=KeyboardInterrupt()> Traceback (most recent call last): File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/controller/device/rpi_serial_hat.py" , line 107 , in readline buffer.write(await loop.run_in_executor(pool, self.ser.read, 1 )) asyncio.exceptions.CancelledErrorDuring handling of the above exception, another exception occurred:Traceback (most recent call last): File "/home/saluser/miniforge/bin/run_ess_controller.py" , line 63 , in <module> asyncio.run(main()) File "/home/saluser/miniforge/lib/python3.8/asyncio/runners.py" , line 44 , in run return loop.run_until_complete(main) File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py" , line 603 , in run_until_complete self.run_forever() File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py" , line 570 , in run_forever self._run_once() File "/home/saluser/miniforge/lib/python3.8/asyncio/base_events.py" , line 1859 , in _run_once handle._run() File "/home/saluser/miniforge/lib/python3.8/asyncio/events.py" , line 81 , in _run self._context.run(self._callback, *self._args) File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/common/device/base_device.py" , line 129 , in _run line = await self.readline() File "/home/saluser/miniforge/lib/python3.8/site-packages/lsst/ts/ess/controller/device/rpi_serial_hat.py" , line 107 , in readline buffer.write(await loop.run_in_executor(pool, self.ser.read, 1 )) File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/_base.py" , line 644 , in __exit__ self.shutdown(wait=True) File "/home/saluser/miniforge/lib/python3.8/concurrent/futures/thread.py" , line 236 , in shutdown t.join() File "/home/saluser/miniforge/lib/python3.8/threading.py" , line 1011 , in join self._wait_for_tstate_lock() File "/home/saluser/miniforge/lib/python3.8/threading.py" , line 1027 , in _wait_for_tstate_lock elif lock.acquire(block, timeout): KeyboardInterrupt
            Hide
            rowen Russell Owen added a comment - - edited

            OneClientServer in ts_tcpip issues the log message "Closing the client socket" in this bit of code:

                async def close_client(self) -> None:
                    """Close the connected client socket, if any."""
                    try:
                        self.log.info("Closing the client socket.")
                        if self.writer is None:
                            return
             
                        writer = self.writer
                        self.writer = None
                        await utils.close_stream_writer(writer)
                        self.connected_task = asyncio.Future()
                    except Exception:
                        self.log.exception("close_client failed; continuing")
                    finally:
                        self.call_connect_callback()
            

            It looks to me like the only place this can hang up is in await utils.close_stream_writer(writer). Once that executes then the callback will be called and print the log messages shown after you press ctrl-C the first time. I can think of two reasons why closing the writer can hang:

            • The close is actually hanging. We could work around that by adding a timeout.
            • Once we await the close another task is taking over the event loop and hanging. If that is what is happening then adding a timeout will not help.

            My guess is that something is hanging the event loop. But adding a timeout is a cheap experiment. Just change:

            await utils.close_stream_writer(writer)
            

            to something like (picking an arbitrary-but-reasonable timeout):

            await asyncio.wait_for(utils.close_stream_writer(writer), timeout=2)
            

            A more sophisticated version would catch the timeout error and report it differently than the catch Exception block (mostly because there is no point to logging a traceback for a timeout) and use a constant for the timeout. But for diagnostic purposes the above is fine.

            Show
            rowen Russell Owen added a comment - - edited OneClientServer in ts_tcpip issues the log message "Closing the client socket" in this bit of code: async def close_client(self) -> None: """Close the connected client socket, if any.""" try: self.log.info("Closing the client socket.") if self.writer is None: return   writer = self.writer self.writer = None await utils.close_stream_writer(writer) self.connected_task = asyncio.Future() except Exception: self.log.exception("close_client failed; continuing") finally: self.call_connect_callback() It looks to me like the only place this can hang up is in await utils.close_stream_writer(writer) . Once that executes then the callback will be called and print the log messages shown after you press ctrl-C the first time. I can think of two reasons why closing the writer can hang: The close is actually hanging. We could work around that by adding a timeout. Once we await the close another task is taking over the event loop and hanging. If that is what is happening then adding a timeout will not help. My guess is that something is hanging the event loop. But adding a timeout is a cheap experiment. Just change: await utils.close_stream_writer(writer) to something like (picking an arbitrary-but-reasonable timeout): await asyncio.wait_for(utils.close_stream_writer(writer), timeout=2) A more sophisticated version would catch the timeout error and report it differently than the catch Exception block (mostly because there is no point to logging a traceback for a timeout) and use a constant for the timeout. But for diagnostic purposes the above is fine.
            Hide
            wvreeven Wouter van Reeven added a comment -

            I reverted most changes (particularly the ones to MockDevice) since that didn't help to resolve the issue. I am afraid that the investigation will have to continue in a future sprint.

            Show
            wvreeven Wouter van Reeven added a comment - I reverted most changes (particularly the ones to MockDevice) since that didn't help to resolve the issue. I am afraid that the investigation will have to continue in a future sprint.
            Hide
            wvreeven Wouter van Reeven added a comment -

            The issue wasn't solved because I cannot reproduce it. Still, please review these small code improvements:

            ts_ess_common PR: https://github.com/lsst-ts/ts_ess_common/pull/13

            ts_ess_controller PR: https://github.com/lsst-ts/ts_ess_controller/pull/24

            ts_ess_csc PR: https://github.com/lsst-ts/ts_ess_csc/pull/60

             

            Show
            wvreeven Wouter van Reeven added a comment - The issue wasn't solved because I cannot reproduce it. Still, please review these small code improvements: ts_ess_common PR: https://github.com/lsst-ts/ts_ess_common/pull/13 ts_ess_controller PR: https://github.com/lsst-ts/ts_ess_controller/pull/24 ts_ess_csc PR: https://github.com/lsst-ts/ts_ess_csc/pull/60  
            Hide
            wvreeven Wouter van Reeven added a comment -

            Reviewed by Russell Owen on GitHub.

            Show
            wvreeven Wouter van Reeven added a comment - Reviewed by Russell Owen on GitHub.

              People

              Assignee:
              wvreeven Wouter van Reeven
              Reporter:
              wvreeven Wouter van Reeven
              Reviewers:
              Russell Owen
              Watchers:
              Russell Owen, Wouter van Reeven
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.