Details
-
Type:
Bug
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ap_pipe
-
Labels:None
-
Story Points:4
-
Epic Link:
-
Sprint:AP S19-3, AP S19-4
-
Team:Alert Production
Description
As discussed in the review for DM-15588 ap_pipe should not assume that it can safely call Ppdb.makeSchema(): the implementation of makeSchema() may break when running on an existing database, or the program might run afoul of (a lack of) table creation permissions.
The most user-friendly way to get around this limitation is to add two command-line flags to ap_pipe.py:
- --make-db: ap_pipe will try to create the database it is configured for, assuming it does not already exist. Behavior if the database already exists is undefined. Not guaranteed to be compatible with all values for PpdbConfig; the user is responsible for verifying compatibility.
- --clobber-db: ap_pipe will delete and replace the database it is configured for, if it already exists. Equivalent to --make-db if the database does not exist. Cannot be combined with --make-db, and should not be combined with --reuse-outputs-from associator or --reuse-outputs-from all.
(If neither argument is provided, then the database is assumed to already exist.)
This will be a breaking change to any scripts that call ap_pipe.py directly, since the current behavior is equivalent to always setting --make-db.
This issue should be done after DM-13887, which will make it easier for ap_verify to pass arbitrary arguments to ap_pipe and make it safe for ap_verify to assume it is starting from scratch.
One recent issue I've encountered comes up when I run ap_pipe with slurm. As you work to implement this ticket, I would appreciate a suggested workflow for the case where I need to create a database once and have a bunch of parallel and/or sequential operations write to it without crashing into one another. I think the --make_db flag as described would solve this, since it sounds like you propose to check if the DB exists and only make it if it does not.