There was quite a bit of scope-creep of a sort in this ticket, but this is now all working. The creeping mainly involved a lot of restructuring of functions to minimize code duplication, but also a new philosophy for the "match" catalogs. Rather than just "denormalizing" the persisted match catalog (which includes only reference/source pairs that were actually used in the astrometric calibration during single frame measurement, SFM) I've implemented a new function, loadReferencesAndMatchToCatalog() which loads in the full reference records for the same search radius used in SFM, but performs a generic match to the source catalog, regardless of whether the objects were used in any calibration step. Culling of the source catalog based on flags can (and is by default) be performed prior to matching with the exception that any source used as a SFM calibration source will be retained for further (sub)selection and analysis.
All five scripts have been run with both doReadParquetTables=True and False (the latter intended to be used only for backwards compatibility for repos that do not have the persisted parquet tables) and comparing a large fraction of them show they are indeed identical. One very minor exception is revealed in the comparison plots at the visit level when applying an external photometric calibration. The application of this calibration goes through different paths for the parquet vs no parquet table loading, and there are ~machine epsilon level differences in the photometry (i.e. at the 10^-11 mmag level). The plots also all compare virtually identically with the latest RC2 run (w_2020_42) with the expected exception of the match plots (see comments above). All of the plots can be perused at:
https://lsst.ncsa.illinois.edu/~lauren/lauren/DM-22266/parq
https://lsst.ncsa.illinois.edu/~lauren/lauren/DM-22266/noParq
(and can be compared with the most recent RC2 run at https://lsst.ncsa.illinois.edu/~emorgan2/w_2020_42_qaplots/)
As for performance, the following is output from sacct (prepended with the table format) reveals a typical speed-up by factors ranging from ~1.5 (visitAnalysis) to ~4 (colorAnalysis) and maximum memory usage decreased by factors ranging from ~2 (visitAnalysis) to ~10 (colorAnalysis), so fairly significant wins all around!
$ sacct -u lauren --units=G --format=jobid,jobname,CPUTime,Elapsed,MaxRSS,State --starttime 2020-11-09T16:26
|
table |
JobID |
JobName |
CPUTime |
Elapsed |
MaxRSS |
State |
noParq |
3258 |
visitAnal+ |
03:51:12 |
00:09:38 |
|
COMPLETED |
noParq |
3258.batch |
batch |
03:51:12 |
00:09:38 |
2.81G |
COMPLETED |
parq |
3259 |
visitAnal+ |
02:29:36 |
00:06:14 |
|
COMPLETED |
parq |
3259.batch |
batch |
02:29:36 |
00:06:14 |
1.28G |
COMPLETED |
noParq |
3264 |
coaddAnal+ |
19:33:36 |
00:48:54 |
|
COMPLETED |
noParq |
3264.batch |
batch |
19:33:36 |
00:48:54 |
75.04G |
COMPLETED |
parq |
3265 |
coaddAnal+ |
11:54:24 |
00:29:46 |
|
COMPLETED |
parq |
3265.batch |
batch |
11:54:24 |
00:29:46 |
17.81G |
COMPLETED |
noParq |
3268 |
colorAnal+ |
1-05:32:48 |
01:13:52 |
|
COMPLETED |
noParq |
3268.batch |
batch |
1-05:32:48 |
01:13:52 |
67.03G |
COMPLETED |
parq |
3269 |
colorAnal+ |
07:17:12 |
00:18:13 |
|
COMPLETED |
parq |
3269.batch |
batch |
07:17:12 |
00:18:13 |
6.66G |
COMPLETED |
There was quite a bit of scope-creep of a sort in this ticket, but this is now all working. The creeping mainly involved a lot of restructuring of functions to minimize code duplication, but also a new philosophy for the "match" catalogs. Rather than just "denormalizing" the persisted match catalog (which includes only reference/source pairs that were actually used in the astrometric calibration during single frame measurement, SFM) I've implemented a new function, loadReferencesAndMatchToCatalog() which loads in the full reference records for the same search radius used in SFM, but performs a generic match to the source catalog, regardless of whether the objects were used in any calibration step. Culling of the source catalog based on flags can (and is by default) be performed prior to matching with the exception that any source used as a SFM calibration source will be retained for further (sub)selection and analysis.
All five scripts have been run with both doReadParquetTables=True and False (the latter intended to be used only for backwards compatibility for repos that do not have the persisted parquet tables) and comparing a large fraction of them show they are indeed identical. One very minor exception is revealed in the comparison plots at the visit level when applying an external photometric calibration. The application of this calibration goes through different paths for the parquet vs no parquet table loading, and there are ~machine epsilon level differences in the photometry (i.e. at the 10^-11 mmag level). The plots also all compare virtually identically with the latest RC2 run (w_2020_42) with the expected exception of the match plots (see comments above). All of the plots can be perused at:
https://lsst.ncsa.illinois.edu/~lauren/lauren/DM-22266/parq
https://lsst.ncsa.illinois.edu/~lauren/lauren/DM-22266/noParq
(and can be compared with the most recent RC2 run at https://lsst.ncsa.illinois.edu/~emorgan2/w_2020_42_qaplots/)
As for performance, the following is output from sacct (prepended with the table format) reveals a typical speed-up by factors ranging from ~1.5 (visitAnalysis) to ~4 (colorAnalysis) and maximum memory usage decreased by factors ranging from ~2 (visitAnalysis) to ~10 (colorAnalysis), so fairly significant wins all around!