Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14439

Explore numeric difference in Mac/Linux ConstrainedPhotometry fitting

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: jointcal
    • Labels:
      None

      Description

      While working on DM-14155, I discovered that Linux and Mac produce different results in the DECAM ConstrainedPhotometryModel tests. This may be a hint about why that model is unstable and not producing good fits, or it may be due to something else. Either way, I should understand what's going on, which is going to require a bunch of debug printing, and maybe looking at the matrix elements themselves.

      As an example, after the error scales are frozen, the mac chi2/ndof is 5073.41/2108, while on linux it is 5072.94/2108. Subsequently, different outliers are rejected and the final fits are measureably different.

        Attachments

          Issue Links

            Activity

            Hide
            Parejkoj John Parejko added a comment -

            In addition, the "4sigma_outliers" decam astrometry test gives different results on centos6. When I dug into that (see DM-14552), there were differences in chi2 values that showed up before any outliers were rejected (different between macOS, centos6, and Ubuntu). Those differences went away with opt=0. There may be something pathological about the decam data in particular.

            Show
            Parejkoj John Parejko added a comment - In addition, the "4sigma_outliers" decam astrometry test gives different results on centos6. When I dug into that (see DM-14552 ), there were differences in chi2 values that showed up before any outliers were rejected (different between macOS, centos6, and Ubuntu). Those differences went away with opt=0 . There may be something pathological about the decam data in particular.
            Hide
            Parejkoj John Parejko added a comment -

            After updating the refcats to the HTM indexed version (which have a higher source density), the default constrained astrometry test also gives different results on mac vs. linux (on macOS, the no_rank_update and with rank update versions have identical results, on Linux they are different). I'm leaving further investigation of this to DM-17597.

            Show
            Parejkoj John Parejko added a comment - After updating the refcats to the HTM indexed version (which have a higher source density), the default constrained astrometry test also gives different results on mac vs. linux (on macOS, the no_rank_update and with rank update versions have identical results, on Linux they are different). I'm leaving further investigation of this to DM-17597 .
            Hide
            rhl Robert Lupton added a comment -

            I'm nervous about leaving this sort of debt behind.  The change of chi^2 at constant dof suggests that we've added and removed an object, and as the 2108 suggests that we have a tiny catalogue this should be easy to sort out.

            I'm also worried about the opt=0 comment.  In the bad old days this could be a register/memory difference in precision, but I thought that had gone away.  In general things that change with optimisation level have a reasonably high probability of being bugs.

            Show
            rhl Robert Lupton added a comment - I'm nervous about leaving this sort of debt behind.  The change of chi^2 at constant dof suggests that we've added and removed an object, and as the 2108 suggests that we have a tiny catalogue this should be easy to sort out. I'm also worried about the opt=0 comment.  In the bad old days this could be a register/memory difference in precision, but I thought that had gone away.  In general things that change with optimisation level have a reasonably high probability of being bugs.
            Hide
            jbosch Jim Bosch added a comment -

            I had an interesting conversation with Mike Jarvis a while ago on Slack about our optimization settings, and I recall coming away surprised at how much -O3 was allowed to make floating point math architecture/compiler-dependent.  If there is no problem with -O2, I'd personally be content dropping this ticket with no further investigation -  but if that turns out to be the case, we may want to open a new ticket to investigate relaxing our default optimization settings more globally.

            Show
            jbosch Jim Bosch added a comment - I had an interesting conversation with Mike Jarvis a while ago on Slack about our optimization settings, and I recall coming away surprised at how much -O3 was allowed to make floating point math architecture/compiler-dependent.  If there is no problem with -O2, I'd personally be content dropping this ticket with no further investigation -  but if that turns out to be the case, we may want to open a new ticket to investigate relaxing our default optimization settings more globally.
            Hide
            swinbank John Swinbank added a comment -

            Whacking this into an F19 epic. I note the comments from various product owners above about how we shouldn't ignore issues like this, so I hope one of them will choose to prioritise it at a forthcoming planning meeting...!

            Show
            swinbank John Swinbank added a comment - Whacking this into an F19 epic. I note the comments from various product owners above about how we shouldn't ignore issues like this, so I hope one of them will choose to prioritise it at a forthcoming planning meeting...!
            Hide
            Parejkoj John Parejko added a comment - - edited

            John Swinbank: should I mark this "Done" or "Invalid"? After exploration as part of DM-17597, I have entirely removed the offending tests, and added a notes about the quality of the testdata to both `test_jointcal_decam.py` and the `testdata_jointcal/README.md`. There is still one DECam test in jointcal, of a very simple model, so at least we can still demonstrate that we can use DECam input.

            The input DECam astrometry is quite catastrophically bad (plotting the sources on sky, the CCDs are far from rectangular in one visit), even after I reprocessed it with a new stack and Gaia DR2, so the jointcal fitter cannot possibly achieve an acceptable fit. I think we could potentially use the data to explore how jointcal handles failed fits, but that would be for another ticket.

            Show
            Parejkoj John Parejko added a comment - - edited John Swinbank : should I mark this "Done" or "Invalid"? After exploration as part of DM-17597 , I have entirely removed the offending tests, and added a notes about the quality of the testdata to both `test_jointcal_decam.py` and the `testdata_jointcal/README.md`. There is still one DECam test in jointcal, of a very simple model, so at least we can still demonstrate that we can use DECam input. The input DECam astrometry is quite catastrophically bad (plotting the sources on sky, the CCDs are far from rectangular in one visit), even after I reprocessed it with a new stack and Gaia DR2, so the jointcal fitter cannot possibly achieve an acceptable fit. I think we could potentially use the data to explore how jointcal handles failed fits, but that would be for another ticket.

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Watchers:
              Dominique Boutigny, Eli Rykoff, Jim Bosch, John Parejko, John Swinbank, Pierre Astier, Robert Lupton
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.