# Enable running ci_hsc_gen3 contents on AWS

XMLWordPrintable

## Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
4
• Team:
External
• Urgent?:
No

## Description

Title might not summarize the end goal in the best way so: this issue collects bugs and improvements I plan on making in order to be able to execute the same list of commands that are execute when ci_hsc_gen3 is scons'd. This boils down to to making the necessary changes such that the following list of commands execute correctly, given the stated conditions, irregardless of whether the commands can be executed via a scons flag or not:

If provided a correct butler.yaml, AWS credentials, a S3 Bucket as repo root and an connectable RDS instance

 makeButlerRepo.py --override s3://bucketname/ -c butler.yaml registerInstrument.py s3://bucketname/ makeGen3Skymap.py s3://bucketname/ -C ~/ci_hsc_gen3/configs/skymap.py skymaps ingestExternalData.py s3://bucketname/ ci_hsc_gen3/resources/external.yaml pipetask run -d "patch = 69" -b s3://bucketname/   -i calib/hsc,raw/hsc,masks/hsc,ref_cats,skymaps,shared/ci_hsc   -o shared/ci_hsc_output   --register-dataset-types   -p ci_hsc_gen3/pipelines/CiHsc.yaml 

This currently is not possible due to the following bugs or issues:

## 1.  Bugs

### a. PostgreSQL database namespace not being handled correctly

In daf/butler/registry/databases/postgresql.py L68 the namespace is set to a non-existent DSN "schema" key.

As-is this breaks PostgreSQL database code when a namespace is not supplied explicitly. This was not noticed since all tests provide an explicit namespace

Psycopg2 manual never explicitly lists parameters but has a "see Pconninfo" pointer to libq library. The manual page on PConninfo states that Pconninfo lists all PQconnectdb values that were set during connection time, omitting the ones that weren't. A list of recognized PQconnectdbParams can be found here. The list does not contain a key valued "schema".

Leaving the namespace as None does not break functionality but is not one of the PostgreSQL recommended secure schema usage patters and it makes an awkward _str_ output.
A solution is to query and use the database default schema. If not properly configured server-side this will default to 'public' which is not a recommended secure pattern so maybe we should consider replacing "public" with "None" and letting psycopg2 fail with a non-existing schema or raising an error when we encounter "public" if we want to strongly enforce the usage of secure schema patterns.

### b. pipe_tasks assumes Butler exists strictly on-local-disk

Remove and/or replace all os.path.exists checks in pipe_tasks. For example the makeGen3SkyMap.py:

 # Verify any supplied paths actually exist on disk if not os.path.exists(args.butler):  print("Butler path specified does not exists")  sys.exit(1) 

This stopped being the case when S3Datastore was implemented.

## 2. Improvements

### a. ingestExternalData.py script in ci_hsc_gen3 hard-codes "symlink" transfer type

Which is fine in context of **ci_hsc_gen3  scons but makes no sense in non-POSIX compliant cases. Add a flag to ingestExternalData.py that defaults to symlink but allows other transfer types to be specified.

### b. makeButlerRepo.py creates a "." key when a fresh S3 Bucket is targeted

Does not break functionality but shouldn't happen. The "." alias for absolute path shouldn't appear as a Key in the data repository root. Investigate and fix.

## 3. Extra goals

• get a scons flag that wraps all the above instead of an external script or individual commands
• Add a way to turn checksums off. Talking to Michelle Gower brought to my attention that this flag does not seem to be added to butler.yaml when DAF_BUTLER_CONFIG_PATH is set, but does if -c is explicitly provided. I suspect --override is missing/not applied in one case but is in the other. In any case if I want to get scons -aws (point directly above) then I'll have to mess with the SConfigure anyhow so I might as well investigate.
• See if timings for individual steps can be added as a flag to scons. Potentially useful but not reported by current scons is the timing breakdown of individual commands.

## People

• Assignee:
Dino Bektesevic
Reporter:
Dino Bektesevic
Reviewers:
Jim Bosch, Tim Jenness
Watchers:
Dino Bektesevic, Hsin-Fang Chiang, Jim Bosch, Kian-Tat Lim, Michelle Gower, Nate Pease, Tim Jenness