# Incorporate Price suggestions to make validate_drp faster

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Team:
SQuaRE

## Description

Increase the loading and processing speed of validate_drp following suggestions by Paul Price

Pass flags=lsst.afw.table.SOURCE_IO_NO_FOOTPRINTS to butler.get

2. Work on speed of calculation of RMS and other expensive quantities. Current suggestions:
a. calcRmsDistances
b. multiMatch
c. matchVisitComputeDistance
d. Consider boolean indexing in afw's multiMatch.py

  objById = {record.get(self.objectKey): record for record in self.reference} to:  objById = dict(zip(self.reference[self.objectKey], self.reference)) 

Note that while this ticket will involve work to reduce the memory footprint of the processing, it will not cover work to re-architect things to enable efficient processing beyond the memory on one node.

## Activity

Michael Wood-Vasey added a comment -

The tableLib.py reading remains the dominant contributor for the performance of validate_drp. I'm going to move toward including the work so far in this ticket which has improved the post-read performance to O(N log N) from O(N^2) for some calculations.

But the fundamental issue remains reading the data in the first place. Good performance here is very much tied in with general infrastructure access models and I"ll defer to a later ticket when performance in validate_drp becomes a significant issue.

Michael Wood-Vasey added a comment -

Code now working. Waiting on merge of DM-6328 before rebasing this one and submitting for review.

Michael Wood-Vasey added a comment -

Relatively quick review.

• Minor speed ups in calculation of the RMS distances. Now O(N log N) instead of O(N^2).
• Added basic test case for AMx calculation with matchVisitComputeDistance.
• Did not strip down catalog files in anticipate of future need for more information, e.g., as in DM-8951.

But major time spent is actually in loading the catalogs. Will defer such work to future improvements to afw.table or alternate data access modes.

Angelo Fausti added a comment -

Michael Wood-Vasey looks good, sorting the arrays for comparison certainly helps. Nice to see the results from cProfile and that reading the data dominates the overall execution time.

Michael Wood-Vasey added a comment -

Merged to master.

## People

• Assignee:
Michael Wood-Vasey
Reporter:
Michael Wood-Vasey
Reviewers:
Angelo Fausti
Watchers:
Angelo Fausti, Michael Wood-Vasey, Paul Price