Fix Version/s: None
Sprint:AP F20-5 (October), AP F20-6 (November)
Currently, ap_verify is hard-coded to use an SQLite APDB when running the AP pipeline. While this is adequate for CI processing, SQLite does not scale well to large datasets. Provide a way for the user to override the use of SQLite, preferably by finding a more scalable way to handle configuration options that depend on the workspace location.
- relates to
DM-26051 Try ap_pipe HiTS2015 rerun with PostgreSQL
I'm just catching up on this. Could we have ap_verify.py require a db_url argument/config? Since ap_verify calls make_apdb early on, if that fails because the db_url passed along is invalid for whatever reason, it could presumably fail quickly with a useful error message.
I would hope that failure mode would be handled adequately by make_apdb itself...
I'm worried that just db_url might be too inflexible; e.g., SQLite also requires that you set the isolation level. Or are you saying provide a db_url argument in addition to a more general config?
I don't want more configs, no, haha... I guess whatever arguments make_apdb requires to find/connect to the APDB, ap_verify should also require and then pass along, is my broad probably-not-revolutionary idea here.
I discussed with Meredith Rawls offline, and the proposal above is probably more flexibility than we'll actually need. For now, I'll just have users pass a db_url argument, and special-case the isolation_level for SQLite (it is also special-cased in ApdbConfig itself).
I think I painted myself into a corner on this one. The problem:
This is a clunky and complicated solution, but I think anything simpler (e.g., a hardcoded doUsePostgres flag) would have just led to even more technical debt down the road.