Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-6166

Time AST and compare to our WCS code

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw

      Description

      Time TAN-SIP for our code and for AST, in order to get a sense of the performance impact of switching to AST for our WCS implementation.

        Attachments

          Issue Links

            Activity

            Hide
            rowen Russell Owen added a comment - - edited

            I added examples/timeWcs.cc. I wrote a similar routine for AST but can't commit it because AST is not part of our stack. Here are the timings, using an image with a TAN-SIP header: calexp-849375-12.fits generated by validate_drp master, commit c1ea4c0, examples/runCfhtQuick.sh. This is on an unloaded 2012 MacBook Pro.

            All transforms are performed in two steps: pixel to sky, then sky back to pixel. The maximum observed error in

            *** LSST ***
            Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits
            2.2471 usec per iteration; max round trip error = (5.02994e-05, 0.000158473) pixels
             
            *** AST ***
            Transform each point in a separate call, using the full frameset;
            this is primarily slow due to per-call overhead; but it is also
            recommended to use a simplified mapping instead of the full frameset
            Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits
            timeWcs; nIter=10000; doSimplify=0
            178.174 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
             
            Transform each point separately, using a simplified mapping extracted from
            the frameset; this is slow due to per-call overhead
            timeWcs; nIter=10000; doSimplify=1
            88.9025 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
             
            Transform all points in a single call on the full frameset;
            this is fast , but one can do even better using a simplified mapping
            timeWcsVectorize; nIter=10000; doSimplify=0
            0.9136 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
             
            Transform all points in a single call on a simplified mapping;
            this is the recommended approach when speed is important;
            timeWcsVectorize; nIter=10000; doSimplify=1
            0.8953 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels
            

            Further improvements to AST for warping are expected due to the ability to eliminate unused portions of transformations. Our present warping code (and this timing tests) first transforms pixel1 to sky, then sky to pixel2. However, this can be simplified to pixel1 to focal plane to pixel2 (avoiding going to sky and back again). That further improvement can be measured, but AST is clearly already faster than our current code, further work is not needed to prove that AST is fast enough.

            Show
            rowen Russell Owen added a comment - - edited I added examples/timeWcs.cc. I wrote a similar routine for AST but can't commit it because AST is not part of our stack. Here are the timings, using an image with a TAN-SIP header: calexp-849375-12.fits generated by validate_drp master, commit c1ea4c0, examples/runCfhtQuick.sh . This is on an unloaded 2012 MacBook Pro. All transforms are performed in two steps: pixel to sky, then sky back to pixel. The maximum observed error in *** LSST *** Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits 2.2471 usec per iteration; max round trip error = (5.02994e-05, 0.000158473) pixels   *** AST *** Transform each point in a separate call, using the full frameset; this is primarily slow due to per-call overhead; but it is also recommended to use a simplified mapping instead of the full frameset Timing 10000 iterations of pixel->sky->pixel of the WCS found in test1.fits timeWcs; nIter=10000; doSimplify=0 178.174 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform each point separately, using a simplified mapping extracted from the frameset; this is slow due to per-call overhead timeWcs; nIter=10000; doSimplify=1 88.9025 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform all points in a single call on the full frameset; this is fast , but one can do even better using a simplified mapping timeWcsVectorize; nIter=10000; doSimplify=0 0.9136 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels   Transform all points in a single call on a simplified mapping; this is the recommended approach when speed is important; timeWcsVectorize; nIter=10000; doSimplify=1 0.8953 usec per iteration; max round trip error = (2.51598e-05, 7.64924e-05) pixels Further improvements to AST for warping are expected due to the ability to eliminate unused portions of transformations. Our present warping code (and this timing tests) first transforms pixel1 to sky, then sky to pixel2. However, this can be simplified to pixel1 to focal plane to pixel2 (avoiding going to sky and back again). That further improvement can be measured, but AST is clearly already faster than our current code, further work is not needed to prove that AST is fast enough.
            Hide
            rowen Russell Owen added a comment -

            I discovered a subtle error with now time/iteration was computed (which I noticed when the value did not stabilize as I increased the number of iterations) that I solved by separately casting nIter and CLOCKS_PER_SEC, so I propagated that change to the rest of the afw timing code. Clearly a central routine would be best, and it should probably go into utils. But that's for another day.

            Show
            rowen Russell Owen added a comment - I discovered a subtle error with now time/iteration was computed (which I noticed when the value did not stabilize as I increased the number of iterations) that I solved by separately casting nIter and CLOCKS_PER_SEC, so I propagated that change to the rest of the afw timing code. Clearly a central routine would be best, and it should probably go into utils. But that's for another day.
            Hide
            rowen Russell Owen added a comment -

            I attached a zip archive containing the AST timing code and instructions for building it.

            Show
            rowen Russell Owen added a comment - I attached a zip archive containing the AST timing code and instructions for building it.
            Hide
            Parejkoj John Parejko added a comment -

            Looks good. Thanks so much for doing this on short notice.

            There's a fair bit of AST code overhead, but otherwise, this is pretty understandable. It certainly suggests that we want a nice C++ interface layer over AST (e.g. "blah.data() gets old fast!).

            It's probably fine to keep code in here. It requires some effort to run, and the readme and build file are necessary, so it's too much for a gist. We just needed it to demonstrate this particular aspect of the project, so I think we're good.

            Show
            Parejkoj John Parejko added a comment - Looks good. Thanks so much for doing this on short notice. There's a fair bit of AST code overhead, but otherwise, this is pretty understandable. It certainly suggests that we want a nice C++ interface layer over AST (e.g. "blah.data() gets old fast!). It's probably fine to keep code in here. It requires some effort to run, and the readme and build file are necessary, so it's too much for a gist. We just needed it to demonstrate this particular aspect of the project, so I think we're good.
            Hide
            Parejkoj John Parejko added a comment -

            Nothing to merge: code is an attachment here, results are in a comment, and summary in DMTN-010.

            Show
            Parejkoj John Parejko added a comment - Nothing to merge: code is an attachment here, results are in a comment, and summary in DMTN-010.

              People

              • Assignee:
                rowen Russell Owen
                Reporter:
                rowen Russell Owen
                Reviewers:
                John Parejko
                Watchers:
                John Parejko, Russell Owen, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel