Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Validation
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Team:SQuaRE
Description
Increase the loading and processing speed of validate_drp following suggestions by Paul Price
1. Don't read in footprints
Pass flags=lsst.afw.table.SOURCE_IO_NO_FOOTPRINTS to butler.get
2. Work on speed of calculation of RMS and other expensive quantities. Current suggestions:
a. calcRmsDistances
b. multiMatch
c. matchVisitComputeDistance
d. Consider boolean indexing in afw's multiMatch.py
objById = {record.get(self.objectKey): record for record in self.reference}
|
to:
|
objById = dict(zip(self.reference[self.objectKey], self.reference))
|
Note that while this ticket will involve work to reduce the memory footprint of the processing, it will not cover work to re-architect things to enable efficient processing beyond the memory on one node.
The tableLib.py reading remains the dominant contributor for the performance of validate_drp. I'm going to move toward including the work so far in this ticket which has improved the post-read performance to O(N log N) from O(N^2) for some calculations.
But the fundamental issue remains reading the data in the first place. Good performance here is very much tied in with general infrastructure access models and I"ll defer to a later ticket when performance in validate_drp becomes a significant issue.