# Port HSC optimisations for reading astrometry.net catalog

XMLWordPrintable

#### Details

• Type: Improvement
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
3
• Sprint:
Science Pipelines DM-S15-5
• Team:
Data Release Production

#### Description

Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues:

While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort.

#### Activity

No builds found.
Paul Price created issue -
Field Original Value New Value
Description Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues:

* [https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178|HSC-1178: Faster construction of Astrometry.net catalog]
* [https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178|HSC-1179: Assertion failure in astrometry.net]

While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort.
Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues:

* [HSC-1178: Faster construction of Astrometry.net catalog|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178]
* [HSC-1179: Assertion failure in astrometry.net|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178]

While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort.
Hide
Paul Price added a comment -

Here's the profile I made demonstrating that reading the astrometry.net index files is the tall pole:

 >>> s = pstats.Stats("profile.dat").sort_stats("cumulative").print_stats(30) Tue Jul 14 17:09:31 2015 profile.dat    5681574 function calls (5658542 primitive calls) in 177.547 seconds    Ordered by: cumulative time  List reduced from 7271 to 30 due to restriction <30>    ncalls tottime percall cumtime percall filename:lineno(function)  1 0.001 0.001 177.548 177.548 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23()  1 0.000 0.000 174.498 174.498 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:401(parseAndRun)  1 0.000 0.000 171.937 171.937 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:172(run) 2456/256 0.101 0.000 171.695 0.671 {map}  1 0.000 0.000 171.672 171.672 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/cmdLineTask.py:289(__call__)  34/1 0.004 0.000 171.613 171.613 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/timer.py:114(wrapper)  1 0.002 0.002 171.612 171.612 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run)  1 0.002 0.002 166.789 166.789 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process)  1 0.001 0.001 155.934 155.934 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run)  2 0.000 0.000 146.176 73.088 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:152(run)  2 0.000 0.000 146.014 73.007 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetAstrometry.py:266(astrometry)  31/11 0.000 0.000 144.478 13.134 /tigress/HSC/LSST/stack10_1/Linux64/pipe_base/10.1+4/python/lsst/pipe/base/task.py:242(makeSubtask)  31/11 0.000 0.000 144.478 13.134 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/configurableField.py:72(apply)  1 0.000 0.000 144.364 144.364 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:203(__init__)  4 0.143 0.036 144.364 36.091 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/loadAstrometryNetObjects.py:148(_readIndexFiles)  1440 0.007 0.000 128.098 0.089 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1651(multiindex_new)  1440 128.091 0.089 128.091 0.089 {_astrometry_net.multiindex_new}  1440 0.009 0.000 14.885 0.010 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/astrometry_net.py:1655(multiindex_add_index)  1440 14.877 0.010 14.877 0.010 {_astrometry_net.multiindex_add_index}  3 0.055 0.018 7.171 2.390 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run)  2158 0.033 0.000 5.191 0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure)  17264 0.023 0.000 4.936 0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure)  1 0.002 0.002 4.821 4.821 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef)  2 0.000 0.000 4.814 2.407 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run)  2 0.002 0.001 4.810 2.405 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints)  323 0.000 0.000 4.415 0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics)  323 4.415 0.014 4.415 0.014 {_mathLib.makeStatistics}  1 0.001 0.001 3.226 3.226 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run)  1 0.059 0.059 3.099 3.099 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:199(determinePsf)  1 0.000 0.000 3.004 3.004 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23() 

Show
 Description Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues: * [HSC-1178: Faster construction of Astrometry.net catalog|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178] * [HSC-1179: Assertion failure in astrometry.net|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178] While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort. Some astrometry.net catalogs used in production can be quite large, and currently all of the catalog must be read in order to determine bounds for each component. This can make the loading of the catalog quite slow (e.g., 144 sec out of 177 sec to process an HSC image, using an SDSS DR9 catalog). We have HSC code that caches the required information, making the catalog load much faster. The code is from the following HSC issues: * [HSC-1087: Make astrometry faster|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1087] * [HSC-1143: Floating point exception in astrometry|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1143] * [HSC-1178: Faster construction of Astrometry.net catalog|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178] * [HSC-1179: Assertion failure in astrometry.net|https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1178] While there have been some changes to the LSST astrometry code that will mean we can't directly cherry-pick the HSC code, yet I think the main structure remains, so the approach can be copied without much effort.
Hide
Russell Owen added a comment -

We are hoping to move away from astrometry.net so it is worth asking how much effort we want to expend in fixing its problems. That said, I don't know the state of the catalog code we wanted to use to replace it, and the new catalog code will not be compatible with the astrometry.net solver, so index files will still be needed for that solver.

Show
Russell Owen added a comment - We are hoping to move away from astrometry.net so it is worth asking how much effort we want to expend in fixing its problems. That said, I don't know the state of the catalog code we wanted to use to replace it, and the new catalog code will not be compatible with the astrometry.net solver, so index files will still be needed for that solver.
Hide
Paul Price added a comment -

I am using the astrometry code to debug the distortion correction, and don't like waiting 2 minutes every time I run something to check if the distortion is working. Since the new astrometry catalog code isn't available now and this will only improve things, I think it's worth doing and worth doing now.

Show
Paul Price added a comment - I am using the astrometry code to debug the distortion correction, and don't like waiting 2 minutes every time I run something to check if the distortion is working. Since the new astrometry catalog code isn't available now and this will only improve things, I think it's worth doing and worth doing now.
 Epic Link DM-1912 [ 15945 ]
 Sprint Science Pipelines DM-S15-5 [ 162 ] Issue Type Story [ 10001 ] Improvement [ 4 ]
 Labels HSC
Hide
Kian-Tat Lim added a comment -

Skimming this quickly, I see a reference to a join in the cache loading code. Should we perhaps consider a SQLite file instead of a FITS table to store this data?

Show
Kian-Tat Lim added a comment - Skimming this quickly, I see a reference to a join in the cache loading code. Should we perhaps consider a SQLite file instead of a FITS table to store this data?
Hide
Paul Price added a comment -

Maybe a SQLite file instead of FITS would have been a better choice for the original implementation, but now that it's implemented I would like to pull this over as is (i.e., using the FITS tables) to avoid the extra development and debugging that would be required to convert to SQLite.

Show
Paul Price added a comment - Maybe a SQLite file instead of FITS would have been a better choice for the original implementation, but now that it's implemented I would like to pull this over as is (i.e., using the FITS tables) to avoid the extra development and debugging that would be required to convert to SQLite.
Hide
Kian-Tat Lim added a comment -

I was under the impression from various comments that a possibly-suboptimal implementation was chosen because all of this is intended to be rewritten on the LSST side. If that is not happening now, please file a story to do this before too long.

Show
Kian-Tat Lim added a comment - I was under the impression from various comments that a possibly-suboptimal implementation was chosen because all of this is intended to be rewritten on the LSST side. If that is not happening now, please file a story to do this before too long.
Hide
Paul Price added a comment -

The commit log message you read is referring to the sub-optimal implementation of astrometry.net, and that LSST is moving to a new astrometry solver.

Show
Paul Price added a comment - The commit log message you read is referring to the sub-optimal implementation of astrometry.net, and that LSST is moving to a new astrometry solver.
Hide
Kian-Tat Lim added a comment -

I see. My mistake.

Show
Kian-Tat Lim added a comment - I see. My mistake.
Hide
Paul Price added a comment -

Here's the profile after the port. Note the reduction in runtime (177 sec --> 45 sec) and the absence of catalog reading in the top 30.

I'm going to clean up (had to adjust some things due to reorganisation on the LSST side) and then submit for review.

 >>> s = pstats.Stats("updated.dat").sort_stats("cumulative").print_stats(30) Thu Jul 16 13:04:09 2015 updated.dat    5586126 function calls (5563266 primitive calls) in 45.036 seconds    Ordered by: cumulative time  List reduced from 6984 to 30 due to restriction <30>    ncalls tottime percall cumtime percall filename:lineno(function)  1 0.004 0.004 45.037 45.037 /home/pprice/LSST/pipe/tasks/bin/processCcd.py:23()  1 0.000 0.000 35.143 35.143 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:405(parseAndRun)  1 0.000 0.000 31.911 31.911 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:172(run)  2236/36 0.103 0.000 28.746 0.798 {map}  1 0.000 0.000 28.725 28.725 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:293(__call__)  34/1 0.005 0.000 28.627 28.627 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/timer.py:114(wrapper)  1 0.003 0.003 28.627 28.627 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:62(run)  1 0.002 0.002 23.197 23.197 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:128(process)  1 0.001 0.001 12.125 12.125 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/calibrate.py:376(run)  1 0.001 0.001 9.536 9.536 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processCcd.py:23()  1 0.077 0.077 9.534 9.534 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/processImage.py:23()  3 0.056 0.019 7.132 2.377 /home/pprice/LSST/meas/base/python/lsst/meas/base/sfm.py:250(run)  1 0.003 0.003 5.425 5.425 /home/pprice/LSST/obs/subaru/python/lsst/obs/subaru/isr.py:150(runDataRef)  2158 0.033 0.000 5.163 0.002 /home/pprice/LSST/meas/base/python/lsst/meas/base/base.py:394(callMeasure)  17264 0.027 0.000 4.912 0.000 /home/pprice/LSST/meas/base/python/lsst/meas/base/wrappers.py:16(measure)  2 0.000 0.000 4.689 2.344 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:303(run)  2 0.002 0.001 4.684 2.342 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/detection.py:346(detectFootprints)  323 0.000 0.000 4.366 0.014 /home/pprice/LSST/afw/python/lsst/afw/math/mathLib.py:5336(makeStatistics)  323 4.365 0.014 4.365 0.014 {_mathLib.makeStatistics}  1 0.010 0.010 4.300 4.300 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/__init__.py:23()  1 0.001 0.001 3.857 3.857 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/psfDeterminerRegistry.py:22()  1 0.001 0.001 3.856 3.856 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/pcaPsfDeterminer.py:22()  1 0.064 0.064 3.852 3.852 /home/pprice/LSST/meas/algorithms/python/lsst/meas/algorithms/utils.py:23()  10 0.001 0.000 3.700 0.370 /tigress/HSC/LSST/stack10_1/Linux64/daf_persistence/10.1+1/python/lsst/daf/persistence/butler.py:247(put)  1 0.240 0.240 3.421 3.421 /tigress/HSC/LSST/stack10_1/Linux64/pex_config/10.1+2/python/lsst/pex/config/__init__.py:22()  1 0.000 0.000 3.176 3.176 /home/pprice/LSST/pipe/base/python/lsst/pipe/base/cmdLineTask.py:267(precall)  1 0.001 0.001 3.087 3.087 /home/pprice/LSST/pipe/tasks/python/lsst/pipe/tasks/measurePsf.py:243(run)  3 0.004 0.001 3.051 1.017 /home/pprice/LSST/daf/butlerUtils/python/lsst/daf/butlerUtils/cameraMapper.py:436(backup)  1 0.096 0.096 3.004 3.004 /tigress/HSC/LSST/stack10_1/Linux64/anaconda/2.1.0-4-g35ca374/lib/python2.7/site-packages/matplotlib/pyplot.py:17()  3 0.338 0.113 2.970 0.990 /home/pprice/LSST/meas/astrom/python/lsst/meas/astrom/anetBasicAstrometry.py:271(useKnownWcs) 

Show
 Link This issue blocks DM-3154 [ DM-3154 ]
Hide
Paul Price added a comment -

Russell, since astrometry is fresh in your mind, would you have a look at this please?

Besides pulling code over from HSC, I've fixed it up to deal with the few changes that have been made on the LSST side, and I've added a test as well. I feel that a lot more could be done to clean all the astrometry.net usage up, but this adds the useful functionality, the astrometry.net usage is hopefully on the way out, and the issues are larger than just this functionality so I'm leaving the more widespread cleanup (or deletion!) to a future ticket.

This patchset includes a script, andCache.py, that can be used to generate a cache file for an existing astrometry_net_data installation. If the cache is there (and use is permitted in the andConfig.py), it's used; otherwise the catalog information is read from the individual index files (which requires reading all of the files) as is done currently. With the information from the cache, only the required files are read to get reference stars; note that this is only useful if the index files have been prepared on healpixes (as I believe is usual for wide-field astrometry.net indices, via its hpsplit program and the -H flag with build-astrometry-index). There would be no great gain for narrow-field catalogs, but for the wide-field catalogs we're using for HSC (SDSS, PS1) it makes a huge difference.

 pprice@tiger-sumire:~/LSST/meas/astrom (tickets/DM-3142=) $git --no-pager log --stat --reverse origin/master.. commit 8eb7e3cf2770f59b426e4d9c411926f6ffcdae4d Author: Paul Price  Date: Tue Dec 2 22:27:51 2014 -0500    Add interface for astrometry_net's healpix_distance_to_radec    This will be useful for optimising the loading of multiindex files  through a cache.    python/lsst/meas/astrom/astrometry_net.i | 16 ++++++++++++++++  1 file changed, 16 insertions(+)   commit aa8af09984f0e8a0222778b3fd66d37c0de0ef13 Author: Paul Price  Date: Mon Dec 8 13:06:55 2014 -0500    Add wrapper for astrometry.net catalog to cache loading    Loading the entire catalog currently requires reading all the index files,  just to get the metadata (the healpix and nside), which is slow.  Here, the multiindex_t and list of multiindex_t are wrapped by the new  MultiIndexCache and AstrometryNetCatalog classes, respectively. An  AstrometryNetCatalog can have a cache written for faster loading of the  metadata. The cache can be generated with a new bin script, andCache.py.    More could be moved from the LoadAstrometryNetObjectsTask into the  AstrometryNetCatalog class, but leaving that for a more comprehensive  cleanup.    bin/andCache.py | 3 +  python/lsst/meas/astrom/astrometryNetDataConfig.py | 2 +  .../lsst/meas/astrom/loadAstrometryNetObjects.py | 85 +------  python/lsst/meas/astrom/multiindex.py | 258 +++++++++++++++++++++  tests/astrometry_net_data/photocal/andConfig.py | 1 +  tests/astrometry_net_data/photocal/andConfig2.py | 1 +  tests/astrometry_net_data/photocal/andConfig3.py | 2 +-  tests/astrometry_net_data/photocal/andConfig4.py | 2 +-  tests/astrometry_net_data/photocal/andConfig5.py | 2 +-  tests/astrometry_net_data/photocal/andConfig6.py | 3 +-  .../photocal/andConfigOpenFiles.py | 1 +  tests/testMultiIndex.py | 17 ++  ups/meas_astrom.table | 1 +  13 files changed, 292 insertions(+), 86 deletions(-)   commit 882fe8c9b6c0baf9796295f1bf16d662dba9413d Author: Paul Price  Date: Thu Mar 5 15:26:21 2015 -0500    MultiIndexCache: faster parsing of cache file    Parsing the PS1 PV2 andCache.fits file was taking 30 sec.  This simpler code for the table join is much faster (about  1 sec).    python/lsst/meas/astrom/multiindex.py | 15 ++++++---------  1 file changed, 6 insertions(+), 9 deletions(-)  Show Paul Price added a comment - Russell, since astrometry is fresh in your mind, would you have a look at this please? Besides pulling code over from HSC, I've fixed it up to deal with the few changes that have been made on the LSST side, and I've added a test as well. I feel that a lot more could be done to clean all the astrometry.net usage up, but this adds the useful functionality, the astrometry.net usage is hopefully on the way out, and the issues are larger than just this functionality so I'm leaving the more widespread cleanup (or deletion!) to a future ticket. This patchset includes a script, andCache.py, that can be used to generate a cache file for an existing astrometry_net_data installation. If the cache is there (and use is permitted in the andConfig.py), it's used; otherwise the catalog information is read from the individual index files (which requires reading all of the files) as is done currently. With the information from the cache, only the required files are read to get reference stars; note that this is only useful if the index files have been prepared on healpixes (as I believe is usual for wide-field astrometry.net indices, via its hpsplit program and the -H flag with build-astrometry-index ). There would be no great gain for narrow-field catalogs, but for the wide-field catalogs we're using for HSC (SDSS, PS1) it makes a huge difference. pprice@tiger-sumire:~/LSST/meas/astrom (tickets/DM-3142=)$ git --no-pager log --stat --reverse origin/master.. commit 8eb7e3cf2770f59b426e4d9c411926f6ffcdae4d Author: Paul Price <price@astro.princeton.edu> Date: Tue Dec 2 22:27:51 2014 -0500   Add interface for astrometry_net's healpix_distance_to_radec This will be useful for optimising the loading of multiindex files through a cache.   python/lsst/meas/astrom/astrometry_net.i | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)   commit aa8af09984f0e8a0222778b3fd66d37c0de0ef13 Author: Paul Price <price@astro.princeton.edu> Date: Mon Dec 8 13:06:55 2014 -0500   Add wrapper for astrometry.net catalog to cache loading Loading the entire catalog currently requires reading all the index files, just to get the metadata (the healpix and nside), which is slow. Here, the multiindex_t and list of multiindex_t are wrapped by the new MultiIndexCache and AstrometryNetCatalog classes, respectively. An AstrometryNetCatalog can have a cache written for faster loading of the metadata. The cache can be generated with a new bin script, andCache.py. More could be moved from the LoadAstrometryNetObjectsTask into the AstrometryNetCatalog class, but leaving that for a more comprehensive cleanup.   bin/andCache.py | 3 + python/lsst/meas/astrom/astrometryNetDataConfig.py | 2 + .../lsst/meas/astrom/loadAstrometryNetObjects.py | 85 +------ python/lsst/meas/astrom/multiindex.py | 258 +++++++++++++++++++++ tests/astrometry_net_data/photocal/andConfig.py | 1 + tests/astrometry_net_data/photocal/andConfig2.py | 1 + tests/astrometry_net_data/photocal/andConfig3.py | 2 +- tests/astrometry_net_data/photocal/andConfig4.py | 2 +- tests/astrometry_net_data/photocal/andConfig5.py | 2 +- tests/astrometry_net_data/photocal/andConfig6.py | 3 +- .../photocal/andConfigOpenFiles.py | 1 + tests/testMultiIndex.py | 17 ++ ups/meas_astrom.table | 1 + 13 files changed, 292 insertions(+), 86 deletions(-)   commit 882fe8c9b6c0baf9796295f1bf16d662dba9413d Author: Paul Price <price@astro.princeton.edu> Date: Thu Mar 5 15:26:21 2015 -0500   MultiIndexCache: faster parsing of cache file Parsing the PS1 PV2 andCache.fits file was taking 30 sec. This simpler code for the table join is much faster (about 1 sec).   python/lsst/meas/astrom/multiindex.py | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
 Reviewers Russell Owen [ rowen ] Status To Do [ 10001 ] In Review [ 10004 ]
Hide
Russell Owen added a comment - - edited

## astrometry_net.i

  lsst::afw::geom::Angle healpixDistance(int hp, int Nside, lsst::afw::coord::Coord const& coord) {  return lsst::afw::geom::Angle(healpix_distance_to_radec(hp, Nside, coord.getLongitude().asDegrees(),  coord.getLatitude().asDegrees(), NULL),  lsst::afw::geom::degrees);  } 

This does not enforce that "coord" is the same coordinate system and epoch as was used to make the index. That might be a bit tricky (though I hope it's solvable) but it doesn't even require that "coord" be an RA/Dec-based coordinate system and that would be trivial to solve. As it stands it can silently give nonsense answers.

Perhaps we can assume that all index files are ICRS and thus that you can convert the RA/Dec to ICRS?

For the ra/dec version please use Angle instead of float unless you expect this to be called so frequently that extracting the value in degrees will be onorous. If you choose ICRS to solve the above problem then please add that information to the doc string.

These functions are only called by isWithinRange, as far as I can see, and I suggest a similar transformation there (below). To the extent that you can make this funciton safe, isWithinRange already becomes safer. And maybe you can just ditch the ra/dec version altogether, which would be nice.

The name seems unclear for a function that generates a cache of a.net load data. A clear name is helpful because it will be on the $PATH. I suggest generateANetLoadCache.py. ## multiindex.py getIndexPath appears to have many logic errors, and a unit test would be helpful! I have included a possible alternative implementation below. Here are the issues I see: • If fn is an absolute path then it returns it without testing if it exists (since os.path.isabs does not test for existence). • lsst.utils.getPackageDir raises an exception if the package is not setup. • The final return statement appears to be incorrect; I think getIndexPath will fail with "fn2 undefined" if anddir is None. • If os.path.exists(fn2) then the function returns non-absolute path getConfigFromEnvironment raises an exception if astrometry_net_data is not setup. MultiIndexCache: • you say that fromFilenameList is not very efficient. If possible I suggest ditching it until we have a demonstrated need for it. I can't find any use of it (not even in tests). • If you retain it, please call healpix = set(...) something else, e.g. healpixSet, and similarly for nside. • In the interest of catching errors early... in __init__ I suggest you enforce int-ness for healpix and nside, e.g.: self._healpix = int(healpix) and similarly for nside. Also, I suggest you raise an exception if len(filenameList) < 2 (if at least one index file is required, or whatever value makes the most sense); in particular this confirms that filenameList is iterable. • isWithinRange: I suggest that ra, dec be a Coord (which the method should convert to IcrsCoord or however you handle healpixDistance, above) and distance be an Angle. That would clean up code I see using it, and that code acts one index at a time, so assuming you don't have many thousands of them, the extra time spent extracting the data as float values in degrees should not be an issue. If you're really worried about it then you could make the list of indices be an additional argument and have this method iterate over them. I realize that makes it less flexible, but it does satisfy the current use cases. • writeCache: I echo K-T's remark that using a FITS table seems sub-optimal as opposed to an SQLite database, but I take your point about an existing implementation. • _initFromCache: very picky, but would you mind making using present tense for the description (e.g. "Ingets the cache file...and use that...")? • generateCache: should the reload loop be inside the try block? That would help if a reload failed and you wanted to be sure to unload any indices that had been reloaded. But that only helps if index.unload() is always safe to call, even if an index has not been [re]loaded. • AstrometryNetCatalog: • It would help if the description said this behaved like an ordered collection of ...(whatever it is). • The argument name "index" for _getitem_ is a bit confusing given what I think is contained are a.net index files. Perhaps "i" would be clearer? itertools is not used (as shows by pyflakes linter) Any doc string that contains Doxygen markup should have an exclamation mark after the opening triple quotes. I suggest always using the exclamation mark because somebody may add Doxygen markup commands later. Here is a possible alternative implementation of getIndexPath:  def getIndexPath(fn):  """!Get the path to the specified astrometry.net index file    @param[in] fn path to index file; if relative, then relative to astrometry_net_data  if that product is setup, else relative to the current working directory  @return the absolute path to the index file, or None if the file was not found  """  try:  tryPath = os.path.join(lsst.utils.getPackageDir('astrometry_net_data'), fn)  except Exception:  tryPath = fn  if os.path.exists(tryPath):  return os.path.abspath(tryPath)  return None  Show Russell Owen added a comment - - edited astrometry_net.i You added function healpixDistance: lsst::afw::geom::Angle healpixDistance(int hp, int Nside, lsst::afw::coord::Coord const& coord) { return lsst::afw::geom::Angle(healpix_distance_to_radec(hp, Nside, coord.getLongitude().asDegrees(), coord.getLatitude().asDegrees(), NULL), lsst::afw::geom::degrees); } This does not enforce that "coord" is the same coordinate system and epoch as was used to make the index. That might be a bit tricky (though I hope it's solvable) but it doesn't even require that "coord" be an RA/Dec-based coordinate system and that would be trivial to solve. As it stands it can silently give nonsense answers. Perhaps we can assume that all index files are ICRS and thus that you can convert the RA/Dec to ICRS? For the ra/dec version please use Angle instead of float unless you expect this to be called so frequently that extracting the value in degrees will be onorous. If you choose ICRS to solve the above problem then please add that information to the doc string. These functions are only called by isWithinRange, as far as I can see, and I suggest a similar transformation there (below). To the extent that you can make this funciton safe, isWithinRange already becomes safer. And maybe you can just ditch the ra/dec version altogether, which would be nice. bin/andCache.py The name seems unclear for a function that generates a cache of a.net load data. A clear name is helpful because it will be on the$PATH. I suggest generateANetLoadCache.py. multiindex.py getIndexPath appears to have many logic errors, and a unit test would be helpful! I have included a possible alternative implementation below. Here are the issues I see: If fn is an absolute path then it returns it without testing if it exists (since os.path.isabs does not test for existence). lsst.utils.getPackageDir raises an exception if the package is not setup. The final return statement appears to be incorrect; I think getIndexPath will fail with "fn2 undefined" if anddir is None. If os.path.exists(fn2) then the function returns non-absolute path getConfigFromEnvironment raises an exception if astrometry_net_data is not setup. MultiIndexCache: you say that fromFilenameList is not very efficient. If possible I suggest ditching it until we have a demonstrated need for it. I can't find any use of it (not even in tests). If you retain it, please call healpix = set(...) something else, e.g. healpixSet, and similarly for nside. In the interest of catching errors early... in __init__ I suggest you enforce int-ness for healpix and nside, e.g.: self._healpix = int(healpix) and similarly for nside. Also, I suggest you raise an exception if len(filenameList) < 2 (if at least one index file is required, or whatever value makes the most sense); in particular this confirms that filenameList is iterable. isWithinRange: I suggest that ra, dec be a Coord (which the method should convert to IcrsCoord or however you handle healpixDistance, above) and distance be an Angle. That would clean up code I see using it, and that code acts one index at a time, so assuming you don't have many thousands of them, the extra time spent extracting the data as float values in degrees should not be an issue. If you're really worried about it then you could make the list of indices be an additional argument and have this method iterate over them. I realize that makes it less flexible, but it does satisfy the current use cases. writeCache: I echo K-T's remark that using a FITS table seems sub-optimal as opposed to an SQLite database, but I take your point about an existing implementation. _initFromCache: very picky, but would you mind making using present tense for the description (e.g. "Ingets the cache file...and use that...")? generateCache: should the reload loop be inside the try block? That would help if a reload failed and you wanted to be sure to unload any indices that had been reloaded. But that only helps if index.unload() is always safe to call, even if an index has not been [re]loaded. AstrometryNetCatalog: It would help if the description said this behaved like an ordered collection of ...(whatever it is). The argument name "index" for _ getitem _ is a bit confusing given what I think is contained are a.net index files. Perhaps "i" would be clearer? itertools is not used (as shows by pyflakes linter) Any doc string that contains Doxygen markup should have an exclamation mark after the opening triple quotes. I suggest always using the exclamation mark because somebody may add Doxygen markup commands later. Here is a possible alternative implementation of getIndexPath: def getIndexPath(fn): """!Get the path to the specified astrometry.net index file   @param[in] fn path to index file; if relative, then relative to astrometry_net_data if that product is setup, else relative to the current working directory @return the absolute path to the index file, or None if the file was not found """ try: tryPath = os.path.join(lsst.utils.getPackageDir('astrometry_net_data'), fn) except Exception: tryPath = fn if os.path.exists(tryPath): return os.path.abspath(tryPath) return None
 Status In Review [ 10004 ] Reviewed [ 10101 ]
Hide
Paul Price added a comment -

Tim Jenness, I replied to one comment on github. Your other two comments corresponded to comments made by Russell, and I've taken them into account in my revision.

Thank you, Russell Owen, for your helpful review comments. I've made almost all the changes you requested, and pushed several fixup commits for you to review if you desire. I've not squashed them in yet so that it's easy for you to see what I've changed.

I'm especially grateful to you for catching my misimplementations of getIndexPath and getConfigFromEnvironment. I think they should now be better, but please let me know what you think.

I left MultiIndexCache.fromFilenameList because it's used by AstrometryNetCatalog._initFromIndexFiles — that's the only way to create a MultiIndexCache if there's no cache, and you need them to create a cache.

I didn't move the reload into the try block in generateCache because I think we should fail if we're unable to read everything when generating a cache — otherwise the cache and the original are out of sync, and you'd hit an assertion.

Show
 Status Reviewed [ 10101 ] In Review [ 10004 ]
Hide
Russell Owen added a comment - - edited

I'll ask again about generateCache (below) but all your other changes look great. Thank you for being so tolerant of all my suggested changes. Leaving your changes as fixup commits so others could look at them was clever. I'll try to remember to do that, rebasing after the final review is done.

Regarding generateCache: I agree that you should raise an exception if a reload fails, but surely that will happen if you move the reload loop into the try/finally block? If an error on loading occurs then the finally block runs, which unloads the indices, and the error is raised. Naively that seems ideal to me (as long as it is safe to call unload even if an index has never been loaded) because it leaves the cache in a known and self-constent state after a failure.

Show
 Status In Review [ 10004 ] Reviewed [ 10101 ]
Hide
Paul Price added a comment -

Ah, I understand. OK, I'll do that.

Going to squash now and then merge after a final Jenkins build.

Show
Paul Price added a comment - Ah, I understand. OK, I'll do that. Going to squash now and then merge after a final Jenkins build.
Hide
Paul Price added a comment -

Squashed, build succeeded and now merged to master.

Thanks!

Show
Paul Price added a comment - Squashed, build succeeded and now merged to master. Thanks!
 Resolution Done [ 10000 ] Status Reviewed [ 10101 ] Done [ 10002 ]
 Story Points 1 3
 Labels HSC

#### People

Assignee:
Paul Price
Reporter:
Paul Price
Reviewers:
Russell Owen
Watchers:
Dominique Boutigny, Kian-Tat Lim, Lauren MacArthur, Paul Price, Russell Owen, Simon Krughoff