Fix Version/s: None
DM-29819 revealed some residual issues related to loading of reference catalogs. Specifically, because of an inconsistent setting of the config parameter:
there are cases where slightly different reference catalogs are getting loaded in gen2 vs. gen3, I think the logic should be that pixelMargin <= computeVisitRegions["single-raw-wcs"].padding. However, as it stands, the default for pixelMargin is 300 and, in cases we were unlucky not to encounter in ci_hsc, this can lead to a smaller loaded region in gen3 because of the smaller visit padded definition (i.e. if a shard edge lies close to the padded visit edge). See
DM-29819 for an example.
While I'm not sure it will be possible to check the above condition in a config "validation", I think we should at least ensure it is satisfied for the stack defaults as well as any overrides in individual obs_ packages.
- relates to
DM-30296 ap_verify HSC Gen 3 ingestion crashes on missing defineVisits config
DM-24024 Revisit region padding in HSC Gen3 ingest or visit definition
DM-28936 Try to get calibration source selection consistent between gen2 and gen3 middleware
DM-29819 Compare the data products of the gen2 vs. gen3 w_2021_14 RC2 runs up to Single Frame Processing
- mentioned in
|Status||To Do [ 10001 ]||In Progress [ 3 ]|
|Attachment||gen2gen3trimmedRefCats.png [ 49453 ]|
|Attachment||gen2gen3trimmedRefCatsFix.png [ 49454 ]|
The full list of CCDs that showed calexp-WCS differences between gen2 & gen3 on the w_2021_14 RC2 runs is:
I will check that the config change fixes all of them first.
|Attachment||gen3RefLoadingIssue.png [ 49546 ]|
Fun new pathology! In testing the above fix against all cases above, it turns out we have a rare funky loading issue that only seems to occur in gen3. The following shows the distribution of reference objects for one of the loaded shards (blue circles). The purple x's are the final gen2 filtered catalog and the outlines show the raw and calexp WCS (based on corners only) of the CCD. The lime green +'s show the filtered gen3 catalog (i.e. trimmed to include only those objects lying within the padded ccd bbox)...
...how are those sprinkling of objects at the lower right that are well beyond the bbox boundaries sneaking in there? My first guess is that the extrapolation of the WCS (to convert from reference coords to ccd pixel coords) is just too fragile that far away from the CCD (but then why is it only this sprinkling that get affected?) Looking into this now...
Ok, something is going wrong with the wcs.skyToPixel() conversion for these objects. As an example, here's an attempt to round trip from skyToPixel and pixelToSky:
In : rootDirGen3 = "/repo/main"
In : butlerGen3 = Butler(rootDirGen3, collections="HSC/runs/RC2/w_2021_14/DM-29528", instrument="HSC")
In : rawGen3 = butlerGen3.get("raw", exposure=19662, detector=1)
In : rawWcsGen3 = rawGen3.getWcs()
In : skyIn = geom.SpherePoint(2.6183551458266163*geom.radians, 0.013514157101839685*geom.radians)
In : skyIn
Out: SpherePoint(150.02069912222632*geom.degrees, 0.7743041656121622*geom.degrees)
In : pixelFromSkyIn = rawWcsGen3.skyToPixel(skyIn)
In : pixelFromSkyIn
Out: Point2D(1208.846746, 2647.026678)
In : skyOutFromPixelIn = rawWcsGen3.pixelToSky(pixelFromSkyIn)
In : skyOutFromPixelIn
Out: SpherePoint(150.32553884146628*geom.degrees, 1.558990288057622*geom.degrees)
In : pixelFromSkyOut = rawWcsGen3.skyToPixel(skyOutFromPixelIn)
In : pixelFromSkyOut
Out: Point2D(1208.846751, 2647.026677)
So, it seems there are certain cases where the conversion is unreliable (double-valued?). I'm not sure how to isolate which cases have this pathology...but it's definitely a "problem"!
|Assignee||Lauren MacArthur [ lauren ]||Eli Rykoff [ erykoff ]|
|Attachment||wcsRoundTrip.png [ 49554 ]|
|Attachment||wcsRoundTripZoom.png [ 49555 ]|
|Attachment||trimmedInput.png [ 49556 ]|
This has been much discussed on this thread in Slack. Here are the details:
The round-tripping from the rawWcs in this context is a mess for all but the sources close to the CCDs location. The following plots demonstrate this:
and zooming in:
...so that's how those problematic sources are sneaking in as "within the CCDs padded bbox"!
It seems we just can't trust extrapolation of the rawWcs very far beyond the CCDs borders. Indeed, if I do an initial trimming of the reference catalog to be much closer to the CCD, the round-tripping is a-ok:
Since we only have the only-very-locally-reliable rawWcs available at this stage, I think we need to do an initial RA/Dec-based filtering on the reference catalog (akin to what the gen2 loader does in the _trimToCircle function based solely on a center and radius in RA/Dec) before doing the trimming-to-bbox based on the rawWcs-based skyToPixel transform. Identifying and hard-coding a “goldilocks-zone-region” on the initial trimming may be a bit fragile (but hopefully only in the rarest of edge cases...or not at all and I'm just being paranoid!)
I don't know about the code-paths and efficiency (and there are better and worse ways to do this), but if you convert the corners of the detector bbox to ra/dec first and filter on that, wouldn't that give the correct range to select reference stars, rather than converting all the reference star ra/dec to x/y (which you have shown is unreliable)? There's a bit more thought that would have to be put into rotations.
Given that the (ra, dec) circle used by Gen2 to pre-filter is known to work, it's a good fallback and the lowest-effort solution. Two more possibilities:
- You can convert the corners of the box you want to convert to (ra, dec), but instead of treating that as a box, put them in a sphgeom.ConvexPolygon; that will take care of rotations naturally. The only thing you have to look out for is that ConvexPolygon edges are great circles, and CCD boundaries are not in the presence of distortions, so this isn't an exact transformation of the geometry, but if you're starting from a box that's badded for other reasons, you may not need any extra padding to account for this.
- The only nontrivial term in the raw WCS is the cameraGeom FIELD_ANGLE->FOCAL_PLANE transform, which includes the optical distortion, so I bet the problem is present there, too. That means a circle around the focal plane should also work, and if not, we really need to fix the cameraGeom. In fact, the ideal solution would be to embed a circle domain in FIELD_ANGLE coordinates (i.e. one that doesn't change based on where you point on the sky) down in the cameraGeom AST objects and have them map points outside that domain to NaN or something (given that we need vectorized calls, raising an exception isn't an option). But I have no idea if that's even possible, let alone how to do it, so unless John Parejko or someone else does, let's just consider (ra, dec) circles for now.
If you go with the easy solution, we should probably open a new ticket to look into this further (and across other cameras) and lock it down in cameraGeom somehow.
Thanks, Jim. I think I have a decent idea of how to do this based on the sphgeom.ConvexPolygon outerSkyRegion that is already being computed (and my hope is that it will be slightly better than the gen2 circle trimming). I’ll put it to the test later today and keep you posted.
|Assignee||Eli Rykoff [ erykoff ]||Lauren MacArthur [ lauren ]|
|Attachment||fixedGen3_19662_1.png [ 49583 ]|
Ok, I put in a "pre-filter" step that make use of the already computed outerSkyRegion and the existing class _FilterCatalog which does an initial filtering based on RA/Dec coords. I've tested all the problematic cases above and they are all "fixed". The region defined by outerSkyRegion does seem to be close enough to a "Goldilocks zone" of not too small and not too large in that it fixes all problem cases and has no effect on the others...but this is based on a fairly limited test sample.
Here is the associated plot for the visit/ccd I've been showing above. The orange circles are what remains in the new pre-filtered catalog (all the outliers are now gone):
I've looked at a bunch of other cases and they all look good, so I think this fix is doing what we need.
As for the config defaults, as per our Slack discussion:
I don’t see any overrides other than in obs_subaru to the ComputeVisitRegionsTask padding config in defineVisits, so I believe all other cameras have had their visits defined with the default 0 padding, which is set here https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/defineVisits.py#L172-L178
I have bumped that up to 250 pixels.
The current default for the pixelMargin used by the reference loaders is 300 pixels:
I have changed this to 250 pixels.
I don’t see any overrides in the camera configs to pixelMargin, so I believe they are all currently using the default (so this will be a change for them).
I’m less sure about this one, and am not hard set on changing it (perhaps Chris Morrison has thoughts?):
Changing it from 300 to 250 caused a test failure. I resolved this by setting the config in the test back to 300.
Finally, I updated the maxOffsetPix defaults in match[Opt/Pess]imisticBTask to be 250 as well:
I think we are in a decent position to merge this, but I won't have time to truly battle test it before the w20 cutoff tonight. All the problematic cases seem to be resolved, and I am getting parity of all "processCcd" products on the one full visit I've processed with both middlewares on these branches. I'm a bit iffy about the fact that the extra padding is not configurable, but the hard-coded values of 100 pixels seems just fine for HSC at least. Let me know your thoughts. Oh, and one thing I wasn't sure of is if we should also remove config/defineVisits.py in obs_subaru since the override there would now be the default.
Jenkins + ci_hsc is happy.
|Reviewers||Jim Bosch [ jbosch ]|
|Status||In Progress [ 3 ]||In Review [ 10004 ]|
|Remote Link||This issue links to "Page (Confluence)" [ 28464 ]|
Looks good, but would you mind removing the obs_subaru defineVisits.py config override as well, now that it's unneccessary? (https://github.com/lsst/obs_subaru/blob/master/config/defineVisits.py)
|Status||In Review [ 10004 ]||Reviewed [ 10101 ]|
|Attachment||allRefCats_19662_1_origPad.png [ 49704 ]|
|Attachment||allRefCats_257768_0_origPad.png [ 49705 ]|
|Attachment||allRefCats_289408_1_origPad.png [ 49706 ]|
|Attachment||allRefCats_19662_1_newPad.png [ 49707 ]|
|Attachment||allRefCats_257768_0_newPad.png [ 49708 ]|
|Attachment||allRefCats_289408_1_newPad.png [ 49709 ]|
Thanks for the review, Jim. I made the changes you requested (removed above noted obs_subaru config, so there's a new PR here, and added a comment to the test override in ap_association). I also did a bit more testing with other datasets just to make sure things look ok (thanks to Meredith Rawls for getting my going with DECam processing in gen3!)
The following plots show:
- full reference catalog (i.e. all of the shards associated with the visit/detector) as blue circles
- new "pre-filtered" catalog added in this ticket as orange crosses
- sources actually used in the SFM astrometry fit (astrom_used) as purple stars
- several WCS/bboxes: black = raw WCS, dotted pink = black + original 300 padding, dash-dot lime = dotted pink + 100 pixel "extra" outerRegion padding, dashed teal = calexp WCS.
In the following series of plots for the three datasets, the first plot shows the above for the original padding of 300 and the second is for the new 250 pixel padding (the pink and lime bboxes still reflect the 300, so you can see the pre-filtered cat doesn't extend all the way out to them).
300 pixel padding
250 pixel padding
300 pixel padding
250 pixel padding
300 pixel padding
250 pixel padding
They all look like the ended up with identical astrometry fits...except for DC2 example, where there seems to be a dearth of astrom_used points in the upper right corner of the original 300 pixel padding, but a bunch do show up in the 250 pixel padding. I actually looked at ~twenty different detectors for this DC2 visit and this was the only one showing any difference. Given that the calexp WCSs look identical and that the 250 padding seems to be "better", I'm not concerned about this difference.
I had to do some rebasing and there's the new obs_subaru PR, so I kicked off another Jenkins (but seeing the recent traffic on #dm-build-problems, I won't be surprised if the MacOS build fails...)
Allrighty...we finally got a green Jenkins. Merged and done! Community post announcing the config default changes and potential edge cases with current gen3 repo visit definitions to follow soon.
|Resolution||Done [ 10000 ]|
|Status||Reviewed [ 10101 ]||Done [ 10002 ]|
Unfortunately, this ticket broke ap_verify as it did not remove the reference to defineVisits.py in https://github.com/lsst/obs_subaru/blob/master/config/datasetIngest-gen3.py#L7
Ok, I have confirmed that setting the pixelMargin to match computeVisitRegions["single-raw-wcs"].padding does indeed resolve the issue for the first CCD I've tested. Here is the before (with pixelMargin = 300):
and here is the after (with pixelMargin = 250):
I still have to battle-test this against a larger selection of the cases this showed up in RC2 and have a think about whether we can guard against "bad/internally inconsistent" settings of these parameters in code...but an override in obs_subaru will be a minimal "solution" for now.