Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Team:Telescope and Site
-
Urgent?:No
Description
ComponentProducerSet's management of sub-produder background tasks is not fully robust. For example if one waits too short a time after creating one before aborting it (calling signal_handler) the main process wedges while waiting for the subprocesses to exit even though the wait has a time limit – it's as if the event loop is dead. To see this use a sleep of 0.1 second in test_run_and_abort_distributed_producer and run with pytest --cov. Also if you change the test to wait 5 seconds (longer than needed, shorter than to fully start DDS stuff in the subprocesses) then the same test will fail (but not hang): waiting for the subprocesses to exit times out. It's all very mysterious.
If we can't figure out how to make the exiting code more robust, then I suggest experimenting with runing the subprocesses the way ScriptQueue runs scripts: using asyncio.create_subprocess_exec. The nice thing about that technique is you have an object you can use to monitor the process and to terminate or kill it. I have never seen that fail in the script queue The reason we didn't do this initially is because it requires passing all the info (lists of topic names) to the command-line executable, which we were trying to avoid. But if it gives more robust process management then I think it's worth the extra command-line arguments.
Attachments
Issue Links
- relates to
-
DM-30082 send private_efdStamp to Kafka with realtime/UTC clock
- Done
Postpone any work on this until we try using Kafka for SAL. If we switch then there is no longer any need for the ts_salkafka package.