Details

    • Type: Improvement
    • Status: Done
    • Priority: Critical
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Templates:
    • Story Points:
      3
    • Sprint:
      Alert Production X16 - 5
    • Team:
      Alert Production

      Description

      Time TAN-SIP for our code and for AST, in order to get a sense of the performance impact of switching to AST for our WCS implementation.

        Issue Links

          Activity

          Hide
          rowen Russell Owen added a comment - - edited

          I added examples/timeWcs.cc. I wrote a similar routine for AST but can't commit it because AST is not part of our stack. Here are the timings, using an image with a TAN-SIP header: calexp-849375-12.fits generated by validate_drp master, commit c1ea4c0, examples/runCfhtQuick.sh. This is on an unloaded 2012 MacBook Pro.

          All transforms are performed in two steps: pixel to sky, then sky back to pixel. The maximum observed error in

          *** LSST ***
          Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits
          2.2471 usec per iteration; max round trip error = (5.02994e-05, 0.000158473) pixels
           
          *** AST ***
          Transform each point in a separate call, using the full frameset;
          this is primarily slow due to per-call overhead; but it is also
          recommended to use a simplified mapping instead of the full frameset
          Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits
          timeWcs; nIter=10000; doSimplify=0
          178.174 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
           
          Transform each point separately, using a simplified mapping extracted from
          the frameset; this is slow due to per-call overhead
          timeWcs; nIter=10000; doSimplify=1
          88.9025 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
           
          Transform all points in a single call on the full frameset;
          this is fast , but one can do even better using a simplified mapping
          timeWcsVectorize; nIter=10000; doSimplify=0
          0.9136 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
           
          Transform all points in a single call on a simplified mapping;
          this is the recommended approach when speed is important;
          timeWcsVectorize; nIter=10000; doSimplify=1
          0.8953 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
          

          Further improvements to AST for warping are expected due to the ability to eliminate unused portions of transformations. Our present warping code (and this timing tests) first transforms pixel1 to sky, then sky to pixel2. However, this can be simplified to pixel1 to focal plane to pixel2 (avoiding going to sky and back again). That further improvement can be measured, but AST is clearly already faster than our current code, further work is not needed to prove that AST is fast enough.

          Show
          rowen Russell Owen added a comment - - edited I added examples/timeWcs.cc. I wrote a similar routine for AST but can't commit it because AST is not part of our stack. Here are the timings, using an image with a TAN-SIP header: calexp-849375-12.fits generated by validate_drp master, commit c1ea4c0, examples/runCfhtQuick.sh . This is on an unloaded 2012 MacBook Pro. All transforms are performed in two steps: pixel to sky, then sky back to pixel. The maximum observed error in *** LSST *** Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits 2.2471 usec per iteration; max round trip error = (5.02994e-05, 0.000158473) pixels   *** AST *** Transform each point in a separate call, using the full frameset; this is primarily slow due to per-call overhead; but it is also recommended to use a simplified mapping instead of the full frameset Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits timeWcs; nIter=10000; doSimplify=0 178.174 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform each point separately, using a simplified mapping extracted from the frameset; this is slow due to per-call overhead timeWcs; nIter=10000; doSimplify=1 88.9025 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform all points in a single call on the full frameset; this is fast , but one can do even better using a simplified mapping timeWcsVectorize; nIter=10000; doSimplify=0 0.9136 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform all points in a single call on a simplified mapping; this is the recommended approach when speed is important; timeWcsVectorize; nIter=10000; doSimplify=1 0.8953 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels Further improvements to AST for warping are expected due to the ability to eliminate unused portions of transformations. Our present warping code (and this timing tests) first transforms pixel1 to sky, then sky to pixel2. However, this can be simplified to pixel1 to focal plane to pixel2 (avoiding going to sky and back again). That further improvement can be measured, but AST is clearly already faster than our current code, further work is not needed to prove that AST is fast enough.
          Hide
          rowen Russell Owen added a comment -

          I discovered a subtle error with now time/iteration was computed (which I noticed when the value did not stabilize as I increased the number of iterations) that I solved by separately casting nIter and CLOCKS_PER_SEC, so I propagated that change to the rest of the afw timing code. Clearly a central routine would be best, and it should probably go into utils. But that's for another day.

          Show
          rowen Russell Owen added a comment - I discovered a subtle error with now time/iteration was computed (which I noticed when the value did not stabilize as I increased the number of iterations) that I solved by separately casting nIter and CLOCKS_PER_SEC, so I propagated that change to the rest of the afw timing code. Clearly a central routine would be best, and it should probably go into utils. But that's for another day.
          Hide
          rowen Russell Owen added a comment -

          I attached a zip archive containing the AST timing code and instructions for building it.

          Show
          rowen Russell Owen added a comment - I attached a zip archive containing the AST timing code and instructions for building it.
          Hide
          Parejkoj John Parejko added a comment -

          Looks good. Thanks so much for doing this on short notice.

          There's a fair bit of AST code overhead, but otherwise, this is pretty understandable. It certainly suggests that we want a nice C++ interface layer over AST (e.g. "blah.data() gets old fast!).

          It's probably fine to keep code in here. It requires some effort to run, and the readme and build file are necessary, so it's too much for a gist. We just needed it to demonstrate this particular aspect of the project, so I think we're good.

          Show
          Parejkoj John Parejko added a comment - Looks good. Thanks so much for doing this on short notice. There's a fair bit of AST code overhead, but otherwise, this is pretty understandable. It certainly suggests that we want a nice C++ interface layer over AST (e.g. "blah.data() gets old fast!). It's probably fine to keep code in here. It requires some effort to run, and the readme and build file are necessary, so it's too much for a gist. We just needed it to demonstrate this particular aspect of the project, so I think we're good.
          Hide
          Parejkoj John Parejko added a comment -

          Nothing to merge: code is an attachment here, results are in a comment, and summary in DMTN-010.

          Show
          Parejkoj John Parejko added a comment - Nothing to merge: code is an attachment here, results are in a comment, and summary in DMTN-010.

            People

            • Assignee:
              rowen Russell Owen
              Reporter:
              rowen Russell Owen
              Reviewers:
              John Parejko
              Watchers:
              John Parejko, Russell Owen, Simon Krughoff
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile