Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-692

Machine precision issue with scheduler simulations

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Retired
    • Resolution: Done
    • Component/s: Sims
    • Labels:
      None
    • Location:
      #sims-operations

      Description

      While simulating the scheduler, we've encountered an issue with cross-platform repeatability. It would seem that different OS and/or hardware setups can store python floats to different precisions. For the scheduler, we have a huge number of float comparisons throughout the code, e.g., "was this point within 1.75 degrees of a pointing", "is this point currently above the airmass limit", "is this point below the zenith limit", "is the time until twilight more than a block time", etc.

      Right now, we are seeing simulations on different machines diverge after about 90 days. Previously, I was able to fix the issue by wrapping floats in an "int_rounded" class whenever there was a comparison being made. So

      if point < airmass_limit
      became
      if int_rounded(point) < int_rounded(airmass_limit)

      This way we ensured values would evaluate at identical precision on different platforms. The problem of course is that this is rather cumbersome and difficult to maintain, everyone working on the code has to remember to wrap their floats when doing comparisons. It's also difficult to ensure all the comparisons are being wrapped, right now we can only tell that we've wrapped enough that simulations do not diverge after 10 years.

      What I'd like to know:

      1) Does it matter if the scheduler is (slightly) non-platform reproducible? This could become an issue if we want to check why the scheduler observed a certain pointing, and are debugging on a machine different from what the scheduler is running on.

      2) If 1 is yes, what is the best way to enforce identical precision comparison on floats in python? Should we keep plowing ahead with wrapping floats in a special class? Maybe make a new function that does comparison at a fixed precision, e.g., float_compare(f1, <, f2, precision=1e-5)?

       

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -
            Show
            ktl Kian-Tat Lim added a comment - Consider https://docs.python.org/3.7/library/decimal.html ?
            Hide
            erykoff Eli Rykoff added a comment -

            I've seen problems like this before, and it wasn't that python was handling floats differently, but rather a numpy/python interface issue (in my case). Specifically, if the numpy value was a np.float32 then how that would be translated into a python float was architecture dependent. Using numpy explicitly to promote these to np.float64 when necessary fixed that issue. I don't know if that's relevant to the scheduler, but thought I'd share. (DM-23630)

            Show
            erykoff Eli Rykoff added a comment - I've seen problems like this before, and it wasn't that python was handling floats differently, but rather a numpy/python interface issue (in my case). Specifically, if the numpy value was a np.float32 then how that would be translated into a python float was architecture dependent. Using numpy explicitly to promote these to np.float64 when necessary fixed that issue. I don't know if that's relevant to the scheduler, but thought I'd share. ( DM-23630 )
            Hide
            pyoachim Peter Yoachim added a comment -

            I had looked at the decimal module. I think I decided it didn't work well enough with numpy arrays to be a good solution here.

            This could be a `np.float32` vs` np.float64` issue. I think numpy defaults to 64, but if some 32s snuck in that might be the problem.

            Show
            pyoachim Peter Yoachim added a comment - I had looked at the decimal module. I think I decided it didn't work well enough with numpy arrays to be a good solution here. This could be a `np.float32` vs` np.float64` issue. I think numpy defaults to 64, but if some 32s snuck in that might be the problem.
            Hide
            ljones Lynne Jones added a comment - - edited

            Tagging Leanne Guy, Tiago Ribeiro and Zeljko Ivezic here in the comments as well (just saw they're watchers already). I am of the opinion that while platform independence is admirable and useful, it should not come at the expense of overly burdening the codebase and making it harder to maintain (which the int_rounded class is currently doing) or at the expense of being overly slow (which we are worried a custom comparison method would do). 

            In practice, what this would mean is that simulated runs would be statistically comparable but not identical, if run on different machines. (we aren't clear if this would also hold true with a docker container or not .. we thought it was hardware-dependent, which would mean docker would still have the problem, but it sounds like some people above are suggesting maybe not?). 

            What does statistical reproducibility but non-identicality mean for actual use?  I think it means that 

            • for daily metrics in operations, I don't think this applies .. the scheduler as run on the mountain might be on different hardware than the scheduler as run on the system verification hardware elsewhere, but within a night we don't see any divergence (at least, not yet – but we could approach this differently if we do see this occurring). We only see divergence after about 90-100 nights of simulation. For daily metrics, we would likely want to run the scheduler in these different places because the scheduler on the mountain is necessary and the scheduler at SV would be used with model inputs to evaluate if there were any issues with the scheduler in real-time and if the night's performance was as-would-be-expected. 
            • for longer-level metrics, I'm not sure .. I am not entirely clear what you'd do here, but I think it's a mix of the above (daily) and below (annual and semi-annual, except without multiple strategies), so I think it is probably not a huge issue.
            • for annual or semi-annual reports, I again don't think this necessarily applies .. I think what we're doing in this case is reading in the already-observed history, then generating various potential future outcomes with varying survey strategies in order to pass this to the SCOC. 
            • for generating large numbers of simulations and then allowing independent users to generate their own simulated surveys to compare to our baseline, this would be a small issue. We would have to ask anyone doing this to either pass us their configuration file so we can run their survey strategy on our own hardware or ask them to run their own baseline survey and then compare their metrics on the new strategy to their own version of the baseline. In terms of generating our own large numbers of survey simulations, we would just have to run them on the same hardware configuration .. which we already do (ie. a single cluster, with the same OS and hardware configuration). 

            Am I overlooking something?

            (PS - is this a problem for DM when processing images, btw? do you end up with slightly different outcomes depending on the machine you processed data on? if not, why not?)

            Show
            ljones Lynne Jones added a comment - - edited Tagging Leanne Guy , Tiago Ribeiro  and Zeljko Ivezic here in the comments as well (just saw they're watchers already). I am of the opinion that while platform independence is admirable and useful, it should not come at the expense of overly burdening the codebase and making it harder to maintain (which the int_rounded class is currently doing) or at the expense of being overly slow (which we are worried a custom comparison method would do).  In practice, what this would mean is that simulated runs would be statistically comparable but not identical, if run on different machines. (we aren't clear if this would also hold true with a docker container or not .. we thought it was hardware-dependent, which would mean docker would still have the problem, but it sounds like some people above are suggesting maybe not?).  What does statistical reproducibility but non-identicality mean for actual use?  I think it means that  for daily metrics in operations, I don't think this applies .. the scheduler as run on the mountain might be on different hardware than the scheduler as run on the system verification hardware elsewhere, but within a night we don't see any divergence (at least, not yet – but we could approach this differently if we do see this occurring). We only see divergence after about 90-100 nights of simulation. For daily metrics, we would likely want to run the scheduler in these different places because the scheduler on the mountain is necessary and the scheduler at SV would be used with model inputs to evaluate if there were any issues with the scheduler in real-time and if the night's performance was as-would-be-expected.  for longer-level metrics, I'm not sure .. I am not entirely clear what you'd do here, but I think it's a mix of the above (daily) and below (annual and semi-annual, except without multiple strategies), so I think it is probably not a huge issue. for annual or semi-annual reports, I again don't think this necessarily applies .. I think what we're doing in this case is reading in the already-observed history, then generating various potential future outcomes with varying survey strategies in order to pass this to the SCOC.  for generating large numbers of simulations and then allowing independent users to generate their own simulated surveys to compare to our baseline, this would be a small issue. We would have to ask anyone doing this to either pass us their configuration file so we can run their survey strategy on our own hardware or ask them to run their own baseline survey and then compare their metrics on the new strategy to their own version of the baseline. In terms of generating our own large numbers of survey simulations, we would just have to run them on the same hardware configuration .. which we already do (ie. a single cluster, with the same OS and hardware configuration).  Am I overlooking something? (PS - is this a problem for DM when processing images, btw? do you end up with slightly different outcomes depending on the machine you processed data on? if not, why not?)
            Hide
            pyoachim Peter Yoachim added a comment -

            As a bit of closure, the issue here turned out to be that python's `os.path.getsize` was returning slightly different values on different platforms, which resulted in different pre-computed sky brightness models to be loaded, which in turn would be interpolated to slightly different values (differences at the 4-5th decimal place).

            So we are cross-platform repeatable again.

            Show
            pyoachim Peter Yoachim added a comment - As a bit of closure, the issue here turned out to be that python's `os.path.getsize` was returning slightly different values on different platforms, which resulted in different pre-computed sky brightness models to be loaded, which in turn would be interpolated to slightly different values (differences at the 4-5th decimal place). So we are cross-platform repeatable again.
            Hide
            pyoachim Peter Yoachim added a comment -

            For the short term at least, the scheduler is cross-platform repeatable again.

            Show
            pyoachim Peter Yoachim added a comment - For the short term at least, the scheduler is cross-platform repeatable again.

              People

              Assignee:
              pyoachim Peter Yoachim
              Reporter:
              pyoachim Peter Yoachim
              Watchers:
              Eli Rykoff, John Parejko, Kian-Tat Lim, Leanne Guy, Lynne Jones, Peter Yoachim, Tiago Ribeiro, Zeljko Ivezic
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.