Fix Version/s: None
Jim Bosch has pointed out that most analysis and some pipelines tasks have switched to using the parquet files instead of the traditional FITS files output by the pipelines. The suggestion is that these tasks should do the same if possible.
Sub-selection of just the necessary columns could improve I/O.
- is blocked by
DM-31188 Fix LoadReferenceCatalogTask so it doesn't temporarily clobber its own config
- relates to
DM-31459 Update faro to use parquet tables for patch and tract-level metric calculation
DM-31460 Update faro to use parquet tables for matched catalog metric calculation
DM-32553 Update faro to use parquet tables for matched catalog metric calculation
- mentioned in
There are many good comments that I think you should consider incorporating before merging. Also, please make sure to post a link to a passing jenkins build for this PR
Here’s a first attempt at base classes for selecting specific columns from the sourceTable_visit table
This runs to completion and produces metric output when run with the following
pipetask --long-log run -b /repo/main/butler.yaml --register-dataset-types -p testpipe.yaml -d "visit=35892 AND skymap='hsc_rings_v1' AND instrument='HSC'" --output u/kbechtol/sourcetable_test -i HSC/runs/RC2/w_2021_18/
description: Compute metrics from sourceTable_visit catalogs
from lsst.faro.base import NumSourcesTask
config.columns = 'coord_ra, coord_dec, visit'