Fix Version/s: None
Looks good mostly. My main suggest is breaking up some of the code in the match method into sub methods to make the calculations more clear. Also adding inline comments would help.
Passed Jenkins after yet another pipe_tasks merge.
Thanks for the feedback. I added more inline comments. I don't really see a good way to refactor the method, because most of it is setting up variables that need to stay in scope for the inner loop. See if you agree based on the comments I added.
Okay. The only comment I have is comparing to data with NaN columns. From experience, you could have it that one of your bands has a NaN for a given value while all the rest of them are fine. If you want to add this functionality (assuming I've read the code correctly.) I'm happy to have it as another ticket.
That already exists with the match_n_finite_min field in MatchProbabilisticConfig. For example, if you're matching on two spatial coordinates and three fluxes, you can set it to 4 to allow one NaN chi value (presumably a flux). For that matter, you can also choose to compute chi without using spatial coordinates at all, although I'm not sure why anyone would want to.
If someone wants more fine-grained control of which columns require measurements, like say requiring g + r fluxes but not u or izy, I think that would require more config fields, but I didn't see a need for that feature at the moment.
I experimented with using pydantic for configuring the matcher, but this proved to have no real benefits over pex_config, so I settled on the following structure:
I haven't added any tests to pipe_tasks yet; I'm not sure if they're warranted. If I were to do so, I'd probably copypasta from the existing test_matchFakes.py and/or test_parquet.py since that is the intended use.
To verify that this works, I ran:
pipetask --long-log run -b /repo/dc2 --input 2.2i/runs/test-med-1/w_2021_40/
DM-32024,2.2i/truth_summary --output "u/dtaranu/ DM-32034/w_2021_40_match_0.5asec" -d "instrument = 'LSSTCam-imSim' and tract=3828 and skymap='DC2'" -p /project/dtaranu/dc2/match_catalog_pipe/match_tract_catalog_probabilistic.yaml --register-dataset-types
The associated yaml/config is valid for w40 and probably w44/6 and will eventually go into DRP pipelines, either after objectTable_tract generation, or as part of faro (TBD). In principle this could beusedto match against an observed reference catalog, not just a truth catalog (e.g. an HST-derived COSMOS catalog for HSC UDeep), but I haven't implemented using uncertainties in the reference catalog yet. I imagine I'd just add them in quadrature to each target source's errors.
Once the matcher is being run as part of DRP, I'll update the notebook from
DM-31781to use the generated datasets to make DC2 vs truth plots.