Details
-
Type:
Bug
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ap_association
-
Labels:
-
Story Points:6
-
Epic Link:
-
Team:Alert Production
Description
When running ap_pipe on hits2015 with slurm, several of the tasks ran into a problem with accessing the association database with sqlite3. It says "sqlite3.OperationalError: database is locked."
After brief discussion with Chris Morrison [X], the problem likely stems from how the code creates temporary tables when querying for DIASources or DIAObjects. These temporary tables are joined into the main DIASource and DIAObject tables to enable queries. When run in parallel, however, it's possible for these temporary tables to be created and dropped in the wrong order.
This ticket is to implement a fix in ap_association.
Attachments
Issue Links
- relates to
-
DM-14259 Try and document running ap_pipe on the Verification Cluster with SLURM
- Done
Started work. Turns out that the temporary tables are not exactly the problem as "database locked" failures when Meredith Rawls runs 6 nodes with 84 jobs on the lsst-dev verify cluster. I have increased the time out of the database connection from the default 5 seconds to 60 and additionally simplified the commits around temporary tables for the joins in querying DIASources/Objects with diaObjectId and pixelId respectively.