I experimented with using pydantic for configuring the matcher, but this proved to have no real benefits over pex_config, so I settled on the following structure:
- matcher_probabilistic.py: MatchProbabilisticConfig and MatcherProbabilistic (a plain old class). The most generic matching functions with no stack dependencies, only pex_config, so this could be spun off at some point (I don't know if there's a good way now).
- match_probabilistic.py: MatchProbabilisticTask, including all stack dependencies. Also has a single test case in tests/test_match_probabilistic_task.py. Could maybe use more to cover other config options.
- match_tract_catalog.py: MatchTractCatalogConfig/Task pipeline task with configurable abstract MatchTractCatalogSubConfig/Task classes, in case someone wants to use/implement another matcher.
- match_tract_catalog_probabilistic.py: MatchTractCatalogProbabilisticConfig/Task classes that inherit from both MatchTractCatalogSubConfig/Task and MatchProbabilisticConfig/Task.
I haven't added any tests to pipe_tasks yet; I'm not sure if they're warranted. If I were to do so, I'd probably copypasta from the existing test_matchFakes.py and/or test_parquet.py since that is the intended use.
To verify that this works, I ran:
pipetask --long-log run -b /repo/dc2 --input 2.2i/runs/test-med-1/w_2021_40/
DM-32024,2.2i/truth_summary --output "u/dtaranu/ DM-32034/w_2021_40_match_0.5asec" -d "instrument = 'LSSTCam-imSim' and tract=3828 and skymap='DC2'" -p /project/dtaranu/dc2/match_catalog_pipe/match_tract_catalog_probabilistic.yaml --register-dataset-types
The associated yaml/config is valid for w40 and probably w44/6 and will eventually go into DRP pipelines, either after objectTable_tract generation, or as part of faro (TBD). In principle this could beusedto match against an observed reference catalog, not just a truth catalog (e.g. an HST-derived COSMOS catalog for HSC UDeep), but I haven't implemented using uncertainties in the reference catalog yet. I imagine I'd just add them in quadrature to each target source's errors.
Once the matcher is being run as part of DRP, I'll update the notebook from
DM-31781 to use the generated datasets to make DC2 vs truth plots.