Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-3142

Port HSC optimisations for reading astrometry.net catalog

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_astrom
    • Labels:
      None

      Description

      Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues:

      While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort.

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            Here's the profile I made demonstrating that reading the astrometry.net index files is the tall pole:

            >>> s = pstats.Stats("profile.dat").sort_stats("cumulative").print_stats(30)
            Tue Jul 14 17:09:31 2015    profile.dat
             
                    5681574 function calls (5658542 primitive calls) in 177.547 seconds
             
              Ordered by: cumulative time
              List reduced from 7271 to 30 due to restriction <30>
             
              ncalls  tottime  percall  cumtime  percall filename:lineno(function)
                    1    0.001    0.001  177.548  177.548 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23(<module>)
                    1    0.000    0.000  174.498  174.498 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:401(parseAndRun)
                    1    0.000    0.000  171.937  171.937 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:172(run)
            2456/256    0.101    0.000  171.695    0.671 {map}
                    1    0.000    0.000  171.672  171.672 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:289(__call__)
                34/1    0.004    0.000  171.613  171.613 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/timer.py:114(wrapper)
                    1    0.002    0.002  171.612  171.612 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run)
                    1    0.002    0.002  166.789  166.789 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process)
                    1    0.001    0.001  155.934  155.934 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run)
                    2    0.000    0.000  146.176  73.088 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:152(run)
                    2    0.000    0.000  146.014  73.007 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:266(astrometry)
                31/11    0.000    0.000  144.478  13.134 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/task.py:242(makeSubtask)
                31/11    0.000    0.000  144.478  13.134 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/configurableField.py:72(apply)
                    1    0.000    0.000  144.364  144.364 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:203(__init__)
                    4    0.143    0.036  144.364  36.091 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/loadAstrometryNetObjects.py:148(_readIndexFiles)
                1440    0.007    0.000  128.098    0.089 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1651(multiindex_new)
                1440  128.091    0.089  128.091    0.089 {_astrometry_net.multiindex_new}
                1440    0.009    0.000  14.885    0.010 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1655(multiindex_add_index)
                1440  14.877    0.010  14.877    0.010 {_astrometry_net.multiindex_add_index}
                    3    0.055    0.018    7.171    2.390 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run)
                2158    0.033    0.000    5.191    0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure)
                17264    0.023    0.000    4.936    0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure)
                    1    0.002    0.002    4.821    4.821 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef)
                    2    0.000    0.000    4.814    2.407 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run)
                    2    0.002    0.001    4.810    2.405 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints)
                  323    0.000    0.000    4.415    0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics)
                  323    4.415    0.014    4.415    0.014 {_mathLib.makeStatistics}
                    1    0.001    0.001    3.226    3.226 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run)
                    1    0.059    0.059    3.099    3.099 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:199(determinePsf)
                    1    0.000    0.000    3.004    3.004 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23(<module>)
            

            Show
            price Paul Price added a comment - Here's the profile I made demonstrating that reading the astrometry.net index files is the tall pole: >>> s = pstats.Stats("profile.dat").sort_stats("cumulative").print_stats(30) Tue Jul 14 17:09:31 2015 profile.dat   5681574 function calls (5658542 primitive calls) in 177.547 seconds   Ordered by: cumulative time List reduced from 7271 to 30 due to restriction <30>   ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 177.548 177.548 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23(<module>) 1 0.000 0.000 174.498 174.498 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:401(parseAndRun) 1 0.000 0.000 171.937 171.937 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:172(run) 2456/256 0.101 0.000 171.695 0.671 {map} 1 0.000 0.000 171.672 171.672 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:289(__call__) 34/1 0.004 0.000 171.613 171.613 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/timer.py:114(wrapper) 1 0.002 0.002 171.612 171.612 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run) 1 0.002 0.002 166.789 166.789 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process) 1 0.001 0.001 155.934 155.934 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run) 2 0.000 0.000 146.176 73.088 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:152(run) 2 0.000 0.000 146.014 73.007 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:266(astrometry) 31/11 0.000 0.000 144.478 13.134 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/task.py:242(makeSubtask) 31/11 0.000 0.000 144.478 13.134 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/configurableField.py:72(apply) 1 0.000 0.000 144.364 144.364 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:203(__init__) 4 0.143 0.036 144.364 36.091 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/loadAstrometryNetObjects.py:148(_readIndexFiles) 1440 0.007 0.000 128.098 0.089 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1651(multiindex_new) 1440 128.091 0.089 128.091 0.089 {_astrometry_net.multiindex_new} 1440 0.009 0.000 14.885 0.010 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1655(multiindex_add_index) 1440 14.877 0.010 14.877 0.010 {_astrometry_net.multiindex_add_index} 3 0.055 0.018 7.171 2.390 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run) 2158 0.033 0.000 5.191 0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure) 17264 0.023 0.000 4.936 0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure) 1 0.002 0.002 4.821 4.821 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef) 2 0.000 0.000 4.814 2.407 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run) 2 0.002 0.001 4.810 2.405 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints) 323 0.000 0.000 4.415 0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics) 323 4.415 0.014 4.415 0.014 {_mathLib.makeStatistics} 1 0.001 0.001 3.226 3.226 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run) 1 0.059 0.059 3.099 3.099 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:199(determinePsf) 1 0.000 0.000 3.004 3.004 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23(<module>)
            Hide
            rowen Russell Owen added a comment -

            We are hoping to move away from astrometry.net so it is worth asking how much effort we want to expend in fixing its problems. That said, I don't know the state of the catalog code we wanted to use to replace it, and the new catalog code will not be compatible with the astrometry.net solver, so index files will still be needed for that solver.

            Show
            rowen Russell Owen added a comment - We are hoping to move away from astrometry.net so it is worth asking how much effort we want to expend in fixing its problems. That said, I don't know the state of the catalog code we wanted to use to replace it, and the new catalog code will not be compatible with the astrometry.net solver, so index files will still be needed for that solver.
            Hide
            price Paul Price added a comment -

            I am using the astrometry code to debug the distortion correction, and don't like waiting 2 minutes every time I run something to check if the distortion is working. Since the new astrometry catalog code isn't available now and this will only improve things, I think it's worth doing and worth doing now.

            Show
            price Paul Price added a comment - I am using the astrometry code to debug the distortion correction, and don't like waiting 2 minutes every time I run something to check if the distortion is working. Since the new astrometry catalog code isn't available now and this will only improve things, I think it's worth doing and worth doing now.
            Hide
            ktl Kian-Tat Lim added a comment -

            Skimming this quickly, I see a reference to a join in the cache loading code. Should we perhaps consider a SQLite file instead of a FITS table to store this data?

            Show
            ktl Kian-Tat Lim added a comment - Skimming this quickly, I see a reference to a join in the cache loading code. Should we perhaps consider a SQLite file instead of a FITS table to store this data?
            Hide
            price Paul Price added a comment -

            Maybe a SQLite file instead of FITS would have been a better choice for the original implementation, but now that it's implemented I would like to pull this over as is (i.e., using the FITS tables) to avoid the extra development and debugging that would be required to convert to SQLite.

            Show
            price Paul Price added a comment - Maybe a SQLite file instead of FITS would have been a better choice for the original implementation, but now that it's implemented I would like to pull this over as is (i.e., using the FITS tables) to avoid the extra development and debugging that would be required to convert to SQLite.
            Hide
            ktl Kian-Tat Lim added a comment -

            I was under the impression from various comments that a possibly-suboptimal implementation was chosen because all of this is intended to be rewritten on the LSST side. If that is not happening now, please file a story to do this before too long.

            Show
            ktl Kian-Tat Lim added a comment - I was under the impression from various comments that a possibly-suboptimal implementation was chosen because all of this is intended to be rewritten on the LSST side. If that is not happening now, please file a story to do this before too long.
            Hide
            price Paul Price added a comment -

            The commit log message you read is referring to the sub-optimal implementation of astrometry.net, and that LSST is moving to a new astrometry solver.

            Show
            price Paul Price added a comment - The commit log message you read is referring to the sub-optimal implementation of astrometry.net, and that LSST is moving to a new astrometry solver.
            Hide
            ktl Kian-Tat Lim added a comment -

            I see. My mistake.

            Show
            ktl Kian-Tat Lim added a comment - I see. My mistake.
            Hide
            price Paul Price added a comment -

            Here's the profile after the port. Note the reduction in runtime (177 sec --> 45 sec) and the absence of catalog reading in the top 30.

            I'm going to clean up (had to adjust some things due to reorganisation on the LSST side) and then submit for review.

            >>> s = pstats.Stats("updated.dat").sort_stats("cumulative").print_stats(30)
            Thu Jul 16 13:04:09 2015    updated.dat
             
                     5586126 function calls (5563266 primitive calls) in 45.036 seconds
             
               Ordered by: cumulative time
               List reduced from 6984 to 30 due to restriction <30>
             
               ncalls  tottime  percall  cumtime  percall filename:lineno(function)
                    1    0.004    0.004   45.037   45.037 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23(<module>)
                    1    0.000    0.000   35.143   35.143 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:405(parseAndRun)
                    1    0.000    0.000   31.911   31.911 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:172(run)
              2236/36    0.103    0.000   28.746    0.798 {map}
                    1    0.000    0.000   28.725   28.725 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:293(__call__)
                 34/1    0.005    0.000   28.627   28.627 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/timer.py:114(wrapper)
                    1    0.003    0.003   28.627   28.627 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run)
                    1    0.002    0.002   23.197   23.197 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process)
                    1    0.001    0.001   12.125   12.125 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run)
                    1    0.001    0.001    9.536    9.536 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23(<module>)
                    1    0.077    0.077    9.534    9.534 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:23(<module>)
                    3    0.056    0.019    7.132    2.377 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run)
                    1    0.003    0.003    5.425    5.425 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef)
                 2158    0.033    0.000    5.163    0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure)
                17264    0.027    0.000    4.912    0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure)
                    2    0.000    0.000    4.689    2.344 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run)
                    2    0.002    0.001    4.684    2.342 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints)
                  323    0.000    0.000    4.366    0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics)
                  323    4.365    0.014    4.365    0.014 {_mathLib.makeStatistics}
                    1    0.010    0.010    4.300    4.300 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/__init__.py:23(<module>)
                    1    0.001    0.001    3.857    3.857 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/psfDeterminerRegistry.py:22(<module>)
                    1    0.001    0.001    3.856    3.856 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:22(<module>)
                    1    0.064    0.064    3.852    3.852 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/utils.py:23(<module>)
                   10    0.001    0.000    3.700    0.370 /tigress/HSC/LSST/stack10_1/Linux64/daf_persistence/10.1+1/python/lsst/daf/persistence/butler.py:247(put)
                    1    0.240    0.240    3.421    3.421 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/__init__.py:22(<module>)
                    1    0.000    0.000    3.176    3.176 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:267(precall)
                    1    0.001    0.001    3.087    3.087 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run)
                    3    0.004    0.001    3.051    1.017 /home/pprice/LSST/daf/butlerUtils/python/lsst/daf/butlerUtils/cameraMapper.py:436(backup)
                    1    0.096    0.096    3.004    3.004 /tigress/HSC/LSST/stack10_1/Linux64/anaconda/2.1.0-4-g35ca374/lib/python2.7/site-packages/matplotlib/pyplot.py:17(<module>)
                    3    0.338    0.113    2.970    0.990 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:271(useKnownWcs)
            

            Show
            price Paul Price added a comment - Here's the profile after the port. Note the reduction in runtime (177 sec --> 45 sec) and the absence of catalog reading in the top 30. I'm going to clean up (had to adjust some things due to reorganisation on the LSST side) and then submit for review. >>> s = pstats.Stats("updated.dat").sort_stats("cumulative").print_stats(30) Thu Jul 16 13:04:09 2015 updated.dat   5586126 function calls (5563266 primitive calls) in 45.036 seconds   Ordered by: cumulative time List reduced from 6984 to 30 due to restriction <30>   ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 45.037 45.037 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23(<module>) 1 0.000 0.000 35.143 35.143 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:405(parseAndRun) 1 0.000 0.000 31.911 31.911 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:172(run) 2236/36 0.103 0.000 28.746 0.798 {map} 1 0.000 0.000 28.725 28.725 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:293(__call__) 34/1 0.005 0.000 28.627 28.627 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/timer.py:114(wrapper) 1 0.003 0.003 28.627 28.627 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run) 1 0.002 0.002 23.197 23.197 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process) 1 0.001 0.001 12.125 12.125 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run) 1 0.001 0.001 9.536 9.536 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23(<module>) 1 0.077 0.077 9.534 9.534 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:23(<module>) 3 0.056 0.019 7.132 2.377 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run) 1 0.003 0.003 5.425 5.425 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef) 2158 0.033 0.000 5.163 0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure) 17264 0.027 0.000 4.912 0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure) 2 0.000 0.000 4.689 2.344 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run) 2 0.002 0.001 4.684 2.342 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints) 323 0.000 0.000 4.366 0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics) 323 4.365 0.014 4.365 0.014 {_mathLib.makeStatistics} 1 0.010 0.010 4.300 4.300 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/__init__.py:23(<module>) 1 0.001 0.001 3.857 3.857 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/psfDeterminerRegistry.py:22(<module>) 1 0.001 0.001 3.856 3.856 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:22(<module>) 1 0.064 0.064 3.852 3.852 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/utils.py:23(<module>) 10 0.001 0.000 3.700 0.370 /tigress/HSC/LSST/stack10_1/Linux64/daf_persistence/10.1+1/python/lsst/daf/persistence/butler.py:247(put) 1 0.240 0.240 3.421 3.421 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/__init__.py:22(<module>) 1 0.000 0.000 3.176 3.176 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:267(precall) 1 0.001 0.001 3.087 3.087 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run) 3 0.004 0.001 3.051 1.017 /home/pprice/LSST/daf/butlerUtils/python/lsst/daf/butlerUtils/cameraMapper.py:436(backup) 1 0.096 0.096 3.004 3.004 /tigress/HSC/LSST/stack10_1/Linux64/anaconda/2.1.0-4-g35ca374/lib/python2.7/site-packages/matplotlib/pyplot.py:17(<module>) 3 0.338 0.113 2.970 0.990 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:271(useKnownWcs)
            Hide
            price Paul Price added a comment -

            Russell, since astrometry is fresh in your mind, would you have a look at this please?

            Besides pulling code over from HSC, I've fixed it up to deal with the few changes that have been made on the LSST side, and I've added a test as well. I feel that a lot more could be done to clean all the astrometry.net usage up, but this adds the useful functionality, the astrometry.net usage is hopefully on the way out, and the issues are larger than just this functionality so I'm leaving the more widespread cleanup (or deletion!) to a future ticket.

            This patchset includes a script, andCache.py, that can be used to generate a cache file for an existing astrometry_net_data installation. If the cache is there (and use is permitted in the andConfig.py), it's used; otherwise the catalog information is read from the individual index files (which requires reading all of the files) as is done currently. With the information from the cache, only the required files are read to get reference stars; note that this is only useful if the index files have been prepared on healpixes (as I believe is usual for wide-field astrometry.net indices, via its hpsplit program and the -H flag with build-astrometry-index). There would be no great gain for narrow-field catalogs, but for the wide-field catalogs we're using for HSC (SDSS, PS1) it makes a huge difference.

            pprice@tiger-sumire:~/LSST/meas/astrom (tickets/DM-3142=) $ git --no-pager log --stat --reverse origin/master..
            commit 8eb7e3cf2770f59b426e4d9c411926f6ffcdae4d
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Dec 2 22:27:51 2014 -0500
             
                Add interface for astrometry_net's healpix_distance_to_radec
                
                This will be useful for optimising the loading of multiindex files
                through a cache.
             
             python/lsst/meas/astrom/astrometry_net.i | 16 ++++++++++++++++
             1 file changed, 16 insertions(+)
             
            commit aa8af09984f0e8a0222778b3fd66d37c0de0ef13
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Mon Dec 8 13:06:55 2014 -0500
             
                Add wrapper for astrometry.net catalog to cache loading
                
                Loading the entire catalog currently requires reading all the index files,
                just to get the metadata (the healpix and nside), which is slow.
                Here, the multiindex_t and list of multiindex_t are wrapped by the new
                MultiIndexCache and AstrometryNetCatalog classes, respectively.  An
                AstrometryNetCatalog can have a cache written for faster loading of the
                metadata.  The cache can be generated with a new bin script, andCache.py.
                
                More could be moved from the LoadAstrometryNetObjectsTask into the
                AstrometryNetCatalog class, but leaving that for a more comprehensive
                cleanup.
             
             bin/andCache.py                                    |   3 +
             python/lsst/meas/astrom/astrometryNetDataConfig.py |   2 +
             .../lsst/meas/astrom/loadAstrometryNetObjects.py   |  85 +------
             python/lsst/meas/astrom/multiindex.py              | 258 +++++++++++++++++++++
             tests/astrometry_net_data/photocal/andConfig.py    |   1 +
             tests/astrometry_net_data/photocal/andConfig2.py   |   1 +
             tests/astrometry_net_data/photocal/andConfig3.py   |   2 +-
             tests/astrometry_net_data/photocal/andConfig4.py   |   2 +-
             tests/astrometry_net_data/photocal/andConfig5.py   |   2 +-
             tests/astrometry_net_data/photocal/andConfig6.py   |   3 +-
             .../photocal/andConfigOpenFiles.py                 |   1 +
             tests/testMultiIndex.py                            |  17 ++
             ups/meas_astrom.table                              |   1 +
             13 files changed, 292 insertions(+), 86 deletions(-)
             
            commit 882fe8c9b6c0baf9796295f1bf16d662dba9413d
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Thu Mar 5 15:26:21 2015 -0500
             
                MultiIndexCache: faster parsing of cache file
                
                Parsing the PS1 PV2 andCache.fits file was taking 30 sec.
                This simpler code for the table join is much faster (about
                1 sec).
             
             python/lsst/meas/astrom/multiindex.py | 15 ++++++---------
             1 file changed, 6 insertions(+), 9 deletions(-)
            

            Show
            price Paul Price added a comment - Russell, since astrometry is fresh in your mind, would you have a look at this please? Besides pulling code over from HSC, I've fixed it up to deal with the few changes that have been made on the LSST side, and I've added a test as well. I feel that a lot more could be done to clean all the astrometry.net usage up, but this adds the useful functionality, the astrometry.net usage is hopefully on the way out, and the issues are larger than just this functionality so I'm leaving the more widespread cleanup (or deletion!) to a future ticket. This patchset includes a script, andCache.py, that can be used to generate a cache file for an existing astrometry_net_data installation. If the cache is there (and use is permitted in the andConfig.py), it's used; otherwise the catalog information is read from the individual index files (which requires reading all of the files) as is done currently. With the information from the cache, only the required files are read to get reference stars; note that this is only useful if the index files have been prepared on healpixes (as I believe is usual for wide-field astrometry.net indices, via its hpsplit program and the -H flag with build-astrometry-index ). There would be no great gain for narrow-field catalogs, but for the wide-field catalogs we're using for HSC (SDSS, PS1) it makes a huge difference. pprice@tiger-sumire:~/LSST/meas/astrom (tickets/DM-3142=) $ git --no-pager log --stat --reverse origin/master.. commit 8eb7e3cf2770f59b426e4d9c411926f6ffcdae4d Author: Paul Price <price@astro.princeton.edu> Date: Tue Dec 2 22:27:51 2014 -0500   Add interface for astrometry_net's healpix_distance_to_radec This will be useful for optimising the loading of multiindex files through a cache.   python/lsst/meas/astrom/astrometry_net.i | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)   commit aa8af09984f0e8a0222778b3fd66d37c0de0ef13 Author: Paul Price <price@astro.princeton.edu> Date: Mon Dec 8 13:06:55 2014 -0500   Add wrapper for astrometry.net catalog to cache loading Loading the entire catalog currently requires reading all the index files, just to get the metadata (the healpix and nside), which is slow. Here, the multiindex_t and list of multiindex_t are wrapped by the new MultiIndexCache and AstrometryNetCatalog classes, respectively. An AstrometryNetCatalog can have a cache written for faster loading of the metadata. The cache can be generated with a new bin script, andCache.py. More could be moved from the LoadAstrometryNetObjectsTask into the AstrometryNetCatalog class, but leaving that for a more comprehensive cleanup.   bin/andCache.py | 3 + python/lsst/meas/astrom/astrometryNetDataConfig.py | 2 + .../lsst/meas/astrom/loadAstrometryNetObjects.py | 85 +------ python/lsst/meas/astrom/multiindex.py | 258 +++++++++++++++++++++ tests/astrometry_net_data/photocal/andConfig.py | 1 + tests/astrometry_net_data/photocal/andConfig2.py | 1 + tests/astrometry_net_data/photocal/andConfig3.py | 2 +- tests/astrometry_net_data/photocal/andConfig4.py | 2 +- tests/astrometry_net_data/photocal/andConfig5.py | 2 +- tests/astrometry_net_data/photocal/andConfig6.py | 3 +- .../photocal/andConfigOpenFiles.py | 1 + tests/testMultiIndex.py | 17 ++ ups/meas_astrom.table | 1 + 13 files changed, 292 insertions(+), 86 deletions(-)   commit 882fe8c9b6c0baf9796295f1bf16d662dba9413d Author: Paul Price <price@astro.princeton.edu> Date: Thu Mar 5 15:26:21 2015 -0500   MultiIndexCache: faster parsing of cache file Parsing the PS1 PV2 andCache.fits file was taking 30 sec. This simpler code for the table join is much faster (about 1 sec).   python/lsst/meas/astrom/multiindex.py | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
            Hide
            rowen Russell Owen added a comment - - edited

            astrometry_net.i

            You added function healpixDistance:

                lsst::afw::geom::Angle healpixDistance(int hp, int Nside, lsst::afw::coord::Coord const& coord) {
                    return lsst::afw::geom::Angle(healpix_distance_to_radec(hp, Nside, coord.getLongitude().asDegrees(),
                                                                            coord.getLatitude().asDegrees(), NULL),
                                                  lsst::afw::geom::degrees);
                }
            

            This does not enforce that "coord" is the same coordinate system and epoch as was used to make the index. That might be a bit tricky (though I hope it's solvable) but it doesn't even require that "coord" be an RA/Dec-based coordinate system and that would be trivial to solve. As it stands it can silently give nonsense answers.

            Perhaps we can assume that all index files are ICRS and thus that you can convert the RA/Dec to ICRS?

            For the ra/dec version please use Angle instead of float unless you expect this to be called so frequently that extracting the value in degrees will be onorous. If you choose ICRS to solve the above problem then please add that information to the doc string.

            These functions are only called by isWithinRange, as far as I can see, and I suggest a similar transformation there (below). To the extent that you can make this funciton safe, isWithinRange already becomes safer. And maybe you can just ditch the ra/dec version altogether, which would be nice.

            bin/andCache.py

            The name seems unclear for a function that generates a cache of a.net load data. A clear name is helpful because it will be on the $PATH. I suggest generateANetLoadCache.py.

            multiindex.py

            getIndexPath appears to have many logic errors, and a unit test would be helpful! I have included a possible alternative implementation below. Here are the issues I see:

            • If fn is an absolute path then it returns it without testing if it exists (since os.path.isabs does not test for existence).
            • lsst.utils.getPackageDir raises an exception if the package is not setup.
            • The final return statement appears to be incorrect; I think getIndexPath will fail with "fn2 undefined" if anddir is None.
            • If os.path.exists(fn2) then the function returns non-absolute path

            getConfigFromEnvironment raises an exception if astrometry_net_data is not setup.

            MultiIndexCache:

            • you say that fromFilenameList is not very efficient. If possible I suggest ditching it until we have a demonstrated need for it. I can't find any use of it (not even in tests).
            • If you retain it, please call healpix = set(...) something else, e.g. healpixSet, and similarly for nside.
            • In the interest of catching errors early... in __init__ I suggest you enforce int-ness for healpix and nside, e.g.: self._healpix = int(healpix) and similarly for nside. Also, I suggest you raise an exception if len(filenameList) < 2 (if at least one index file is required, or whatever value makes the most sense); in particular this confirms that filenameList is iterable.
            • isWithinRange: I suggest that ra, dec be a Coord (which the method should convert to IcrsCoord or however you handle healpixDistance, above) and distance be an Angle. That would clean up code I see using it, and that code acts one index at a time, so assuming you don't have many thousands of them, the extra time spent extracting the data as float values in degrees should not be an issue. If you're really worried about it then you could make the list of indices be an additional argument and have this method iterate over them. I realize that makes it less flexible, but it does satisfy the current use cases.
            • writeCache: I echo K-T's remark that using a FITS table seems sub-optimal as opposed to an SQLite database, but I take your point about an existing implementation.
            • _initFromCache: very picky, but would you mind making using present tense for the description (e.g. "Ingets the cache file...and use that...")?
            • generateCache: should the reload loop be inside the try block? That would help if a reload failed and you wanted to be sure to unload any indices that had been reloaded. But that only helps if index.unload() is always safe to call, even if an index has not been [re]loaded.
            • AstrometryNetCatalog:
            • It would help if the description said this behaved like an ordered collection of ...(whatever it is).
            • The argument name "index" for _getitem_ is a bit confusing given what I think is contained are a.net index files. Perhaps "i" would be clearer?

            itertools is not used (as shows by pyflakes linter)

            Any doc string that contains Doxygen markup should have an exclamation mark after the opening triple quotes. I suggest always using the exclamation mark because somebody may add Doxygen markup commands later.

            Here is a possible alternative implementation of getIndexPath:

            def getIndexPath(fn):
                """!Get the path to the specified astrometry.net index file
             
                @param[in] fn  path to index file; if relative, then relative to astrometry_net_data
                    if that product is setup, else relative to the current working directory
                @return the absolute path to the index file, or None if the file was not found
                """
                try:
                    tryPath = os.path.join(lsst.utils.getPackageDir('astrometry_net_data'), fn)
                except Exception:
                    tryPath = fn
                if os.path.exists(tryPath):
                    return os.path.abspath(tryPath)
                return None
            

            Show
            rowen Russell Owen added a comment - - edited astrometry_net.i You added function healpixDistance: lsst::afw::geom::Angle healpixDistance(int hp, int Nside, lsst::afw::coord::Coord const& coord) { return lsst::afw::geom::Angle(healpix_distance_to_radec(hp, Nside, coord.getLongitude().asDegrees(), coord.getLatitude().asDegrees(), NULL), lsst::afw::geom::degrees); } This does not enforce that "coord" is the same coordinate system and epoch as was used to make the index. That might be a bit tricky (though I hope it's solvable) but it doesn't even require that "coord" be an RA/Dec-based coordinate system and that would be trivial to solve. As it stands it can silently give nonsense answers. Perhaps we can assume that all index files are ICRS and thus that you can convert the RA/Dec to ICRS? For the ra/dec version please use Angle instead of float unless you expect this to be called so frequently that extracting the value in degrees will be onorous. If you choose ICRS to solve the above problem then please add that information to the doc string. These functions are only called by isWithinRange, as far as I can see, and I suggest a similar transformation there (below). To the extent that you can make this funciton safe, isWithinRange already becomes safer. And maybe you can just ditch the ra/dec version altogether, which would be nice. bin/andCache.py The name seems unclear for a function that generates a cache of a.net load data. A clear name is helpful because it will be on the $PATH. I suggest generateANetLoadCache.py. multiindex.py getIndexPath appears to have many logic errors, and a unit test would be helpful! I have included a possible alternative implementation below. Here are the issues I see: If fn is an absolute path then it returns it without testing if it exists (since os.path.isabs does not test for existence). lsst.utils.getPackageDir raises an exception if the package is not setup. The final return statement appears to be incorrect; I think getIndexPath will fail with "fn2 undefined" if anddir is None. If os.path.exists(fn2) then the function returns non-absolute path getConfigFromEnvironment raises an exception if astrometry_net_data is not setup. MultiIndexCache: you say that fromFilenameList is not very efficient. If possible I suggest ditching it until we have a demonstrated need for it. I can't find any use of it (not even in tests). If you retain it, please call healpix = set(...) something else, e.g. healpixSet, and similarly for nside. In the interest of catching errors early... in __init__ I suggest you enforce int-ness for healpix and nside, e.g.: self._healpix = int(healpix) and similarly for nside. Also, I suggest you raise an exception if len(filenameList) < 2 (if at least one index file is required, or whatever value makes the most sense); in particular this confirms that filenameList is iterable. isWithinRange: I suggest that ra, dec be a Coord (which the method should convert to IcrsCoord or however you handle healpixDistance, above) and distance be an Angle. That would clean up code I see using it, and that code acts one index at a time, so assuming you don't have many thousands of them, the extra time spent extracting the data as float values in degrees should not be an issue. If you're really worried about it then you could make the list of indices be an additional argument and have this method iterate over them. I realize that makes it less flexible, but it does satisfy the current use cases. writeCache: I echo K-T's remark that using a FITS table seems sub-optimal as opposed to an SQLite database, but I take your point about an existing implementation. _initFromCache: very picky, but would you mind making using present tense for the description (e.g. "Ingets the cache file...and use that...")? generateCache: should the reload loop be inside the try block? That would help if a reload failed and you wanted to be sure to unload any indices that had been reloaded. But that only helps if index.unload() is always safe to call, even if an index has not been [re]loaded. AstrometryNetCatalog: It would help if the description said this behaved like an ordered collection of ...(whatever it is). The argument name "index" for _ getitem _ is a bit confusing given what I think is contained are a.net index files. Perhaps "i" would be clearer? itertools is not used (as shows by pyflakes linter) Any doc string that contains Doxygen markup should have an exclamation mark after the opening triple quotes. I suggest always using the exclamation mark because somebody may add Doxygen markup commands later. Here is a possible alternative implementation of getIndexPath: def getIndexPath(fn): """!Get the path to the specified astrometry.net index file   @param[in] fn path to index file; if relative, then relative to astrometry_net_data if that product is setup, else relative to the current working directory @return the absolute path to the index file, or None if the file was not found """ try: tryPath = os.path.join(lsst.utils.getPackageDir('astrometry_net_data'), fn) except Exception: tryPath = fn if os.path.exists(tryPath): return os.path.abspath(tryPath) return None
            Hide
            price Paul Price added a comment -

            Tim Jenness, I replied to one comment on github. Your other two comments corresponded to comments made by Russell, and I've taken them into account in my revision.

            Thank you, Russell Owen, for your helpful review comments. I've made almost all the changes you requested, and pushed several fixup commits for you to review if you desire. I've not squashed them in yet so that it's easy for you to see what I've changed.

            I'm especially grateful to you for catching my misimplementations of getIndexPath and getConfigFromEnvironment. I think they should now be better, but please let me know what you think.

            I left MultiIndexCache.fromFilenameList because it's used by AstrometryNetCatalog._initFromIndexFiles — that's the only way to create a MultiIndexCache if there's no cache, and you need them to create a cache.

            I didn't move the reload into the try block in generateCache because I think we should fail if we're unable to read everything when generating a cache — otherwise the cache and the original are out of sync, and you'd hit an assertion.

            Show
            price Paul Price added a comment - Tim Jenness , I replied to one comment on github. Your other two comments corresponded to comments made by Russell, and I've taken them into account in my revision. Thank you, Russell Owen , for your helpful review comments. I've made almost all the changes you requested, and pushed several fixup commits for you to review if you desire. I've not squashed them in yet so that it's easy for you to see what I've changed. I'm especially grateful to you for catching my misimplementations of getIndexPath and getConfigFromEnvironment . I think they should now be better, but please let me know what you think. I left MultiIndexCache.fromFilenameList because it's used by AstrometryNetCatalog._initFromIndexFiles — that's the only way to create a MultiIndexCache if there's no cache, and you need them to create a cache. I didn't move the reload into the try block in generateCache because I think we should fail if we're unable to read everything when generating a cache — otherwise the cache and the original are out of sync, and you'd hit an assertion.
            Hide
            rowen Russell Owen added a comment - - edited

            I'll ask again about generateCache (below) but all your other changes look great. Thank you for being so tolerant of all my suggested changes. Leaving your changes as fixup commits so others could look at them was clever. I'll try to remember to do that, rebasing after the final review is done.

            Regarding generateCache: I agree that you should raise an exception if a reload fails, but surely that will happen if you move the reload loop into the try/finally block? If an error on loading occurs then the finally block runs, which unloads the indices, and the error is raised. Naively that seems ideal to me (as long as it is safe to call unload even if an index has never been loaded) because it leaves the cache in a known and self-constent state after a failure.

            Show
            rowen Russell Owen added a comment - - edited I'll ask again about generateCache (below) but all your other changes look great. Thank you for being so tolerant of all my suggested changes. Leaving your changes as fixup commits so others could look at them was clever. I'll try to remember to do that, rebasing after the final review is done. Regarding generateCache: I agree that you should raise an exception if a reload fails, but surely that will happen if you move the reload loop into the try/finally block? If an error on loading occurs then the finally block runs, which unloads the indices, and the error is raised. Naively that seems ideal to me (as long as it is safe to call unload even if an index has never been loaded) because it leaves the cache in a known and self-constent state after a failure.
            Hide
            price Paul Price added a comment -

            Ah, I understand. OK, I'll do that.

            Going to squash now and then merge after a final Jenkins build.

            Show
            price Paul Price added a comment - Ah, I understand. OK, I'll do that. Going to squash now and then merge after a final Jenkins build.
            Hide
            price Paul Price added a comment -

            Squashed, build succeeded and now merged to master.

            Thanks!

            Show
            price Paul Price added a comment - Squashed, build succeeded and now merged to master. Thanks!

              People

              Assignee:
              price Paul Price
              Reporter:
              price Paul Price
              Reviewers:
              Russell Owen
              Watchers:
              Dominique Boutigny, Kian-Tat Lim, Lauren MacArthur, Paul Price, Russell Owen, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins Builds

                  No builds found.