Also heads up for your memory investigation, during pair-coding today, Sophie and I noticed that WriteObjectTable was acting strange on /repo/main.
pipetask run -b /repo/main -i HSC/runs/RC2/w_2021_14/DM-29528 -o u/yusra/objectTables -p $OBS_SUBARU_DIR/pipelines/DRP.yaml#writeObjectTable -d "instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=23" --register-dataset-types --instrument lsst.obs.subaru.HyperSuprimeCam -j 1
|
|
just keeps climbing to 100GB and beyond as it's reading in the inputs (before the logger says it's starting the task).I killed it when top said:
1491649 yusra 20 0 129.3g 125.5g 72772 R 100.0 49.9 12:07.66 python
|
In contrast, while running on /datasets/hsc/gen3repo/rc2w06_ssw06 with w_2021_17, writeObjectTask is quick and painless:
(lsst-scipipe) [yusra@lsst-devl01 ~]$ /usr/bin/time -v pipetask run -b /datasets/hsc/gen3repo/rc2w06_ssw06 -i HSC/runs/RC2/w_2021_06 -o u/yusra/object_test_again -p $OBS_SUBARU_DIR/pipelines/DRP.yaml#writeObjectTable -d "instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=23 " --register-dataset-types --instrument lsst.obs.subaru.HyperSuprimeCam -j 1
|
ctrl.mpexec.cmdLineFwk INFO: QuantumGraph contains 1 quanta for 1 tasks, graph ID: '1619653960.919665-1506427'
|
conda.common.io INFO: overtaking stderr and stdout
|
conda.common.io INFO: stderr and stdout yielding back
|
ctrl.mpexec.singleQuantumExecutor INFO: Execution of task 'writeObjectTable' on quantum {skymap: 'hsc_rings_v1', tract: 9813, patch: 23} took 62.071 seconds
|
ctrl.mpexec.mpGraphExecutor INFO: Executed 1 quanta, 0 remain out of total 1 quanta.
|
Command being timed: "pipetask run -b /datasets/hsc/gen3repo/rc2w06_ssw06 -i HSC/runs/RC2/w_2021_06 -o u/yusra/object_test_again -p /software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_subaru/21.0.0-32-g0ce1f32a+fd3c508698/pipelines/DRP.yaml#writeObjectTable -d instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=23 --register-dataset-types --instrument lsst.obs.subaru.HyperSuprimeCam -j 1"
|
User time (seconds): 57.62
|
System time (seconds): 13.84
|
Percent of CPU this job got: 94%
|
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15.29
|
Average shared text size (kbytes): 0
|
Average unshared data size (kbytes): 0
|
Average stack size (kbytes): 0
|
Average total size (kbytes): 0
|
Maximum resident set size (kbytes): 10290184
|
Average resident set size (kbytes): 0
|
Major (requiring I/O) page faults: 0
|
Minor (reclaiming a frame) page faults: 5767844
|
Voluntary context switches: 14906
|
Involuntary context switches: 448033
|
Swaps: 0
|
File system inputs: 0
|
File system outputs: 0
|
Socket messages sent: 0
|
Socket messages received: 0
|
Signals delivered: 0
|
Page size (bytes): 4096
|
Exit status: 0
|
For reference, he gen2 version takes around 15GB per patch (remember this is why we concatenate the narrow tables instead of these wide ones). It reads in 5 deepCoadd_meas, 5 deepCoadd_forced, and 1 deepCoadd_ref. (I think it holds the input afwTable in memory even after creating the DataFrame which means, I can prob cut memory in half for next weekly)
singleFrame subset worked (as expected; it already worked in w14).
First attempt at multiVisit failed because of an exception raised in jointcal while trying to apply color terms:
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/jointcal/21.0.0-14-gc0d6d5c+36dc79dc4c/python/lsst/jointcal/jointcal.py", line 744, in run
photometry = self._do_load_refcat_and_fit(associations, defaultFilter, center, radius,
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/jointcal/21.0.0-14-gc0d6d5c+36dc79dc4c/python/lsst/jointcal/jointcal.py", line 1229, in _do_load_refcat_and_fit
refCat, fluxField = self._load_reference_catalog(refObjLoader, referenceSelector,
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/jointcal/21.0.0-14-gc0d6d5c+36dc79dc4c/python/lsst/jointcal/jointcal.py", line 1309, in _load_reference_catalog
refCatName = refObjLoader.ref_dataset_name
AttributeError: 'ReferenceObjectLoader' object has no attribute 'ref_dataset_name'
This didn't happen in the w14 processing because the pre-review branches of
DM-29615disabled jointcal photometry and hence its color terms. During review, I was advised to try to disable jointcal photometry in the HSC configs, not just the Gen3 pipeline, but that got tricky and I ended up not disabling jointcal photometry at all. It looks like we do need to disable it in Gen3 for now (I'll do that on this ticket) and open new ones to both disable it more properly via config and fix the above error so it could be re-enabled if desired.