Title might not summarize the end goal in the best way so: this issue collects bugs and improvements I plan on making in order to be able to execute the same list of commands that are execute when ci_hsc_gen3 is scons'd. This boils down to to making the necessary changes such that the following list of commands execute correctly, given the stated conditions, irregardless of whether the commands can be executed via a scons flag or not:
If provided a correct butler.yaml, AWS credentials, a S3 Bucket as repo root and an connectable RDS instance
This currently is not possible due to the following bugs or issues:
In daf/butler/registry/databases/postgresql.py L68 the namespace is set to a non-existent DSN "schema" key.
As-is this breaks PostgreSQL database code when a namespace is not supplied explicitly. This was not noticed since all tests provide an explicit namespace
Psycopg2 manual never explicitly lists parameters but has a "see Pconninfo" pointer to libq library. The manual page on PConninfo states that Pconninfo lists all PQconnectdb values that were set during connection time, omitting the ones that weren't. A list of recognized PQconnectdbParams can be found here. The list does not contain a key valued "schema".
Leaving the namespace as None does not break functionality but is not one of the PostgreSQL recommended secure schema usage patters and it makes an awkward _str_ output.
A solution is to query and use the database default schema. If not properly configured server-side this will default to 'public' which is not a recommended secure pattern so maybe we should consider replacing "public" with "None" and letting psycopg2 fail with a non-existing schema or raising an error when we encounter "public" if we want to strongly enforce the usage of secure schema patterns.
Remove and/or replace all os.path.exists checks in pipe_tasks. For example the makeGen3SkyMap.py:
This stopped being the case when S3Datastore was implemented.
Which is fine in context of **ci_hsc_gen3 scons but makes no sense in non-POSIX compliant cases. Add a flag to ingestExternalData.py that defaults to symlink but allows other transfer types to be specified.
Does not break functionality but shouldn't happen. The "." alias for absolute path shouldn't appear as a Key in the data repository root. Investigate and fix.
- get a scons flag that wraps all the above instead of an external script or individual commands
- Add a way to turn checksums off. Talking to Michelle Gower brought to my attention that this flag does not seem to be added to butler.yaml when DAF_BUTLER_CONFIG_PATH is set, but does if -c is explicitly provided. I suspect --override is missing/not applied in one case but is in the other. In any case if I want to get scons -aws (point directly above) then I'll have to mess with the SConfigure anyhow so I might as well investigate.
- See if timings for individual steps can be added as a flag to scons. Potentially useful but not reported by current scons is the timing breakdown of individual commands.