Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Story Points:3
-
Team:Architecture
-
Urgent?:No
Description
Large submissions are taking hours in the post-QuantumGraph generation steps. After profiling and removing some of the obvious bps issues, I'm currently left with config (or functions called by config, e.g., isinstance) lines in the top 20. BpsConfig inherits from lsst.daf.butler.Config and most of the top items in the profiling are lsst.daf.butler.Config related (because BpsConfig mostly just adds more calls to the lookup, e.g., is it in this section, if not what about this section; lookup and replace all the variables in the string).
This is not a blocker. I've found some places to cache results to reduce the large numbers of lookups and where easily possible to use the hierarchy lookup (.pipetask.makeWarp.requestMemory).
Just to make sure I could actually see the timing difference, I wrote a tiny program to compare a ton of lookups using builtin dict vs lsst.daf.butler.Config. (One of the steps in the DC2 reprocessing had around 325,000 Quanta so I used that number as a starting point to fake looking up 10 values each and then 100 values each)
#!/usr/bin/env python |
|
import time |
|
from lsst.daf.butler import Config |
|
def query_config_1(config, n): |
stime = time.time() |
for i in range(0, n): |
x = config['pipetask'] |
y = x['makeWarp'] |
z = y['requestMemory'] |
print("query_config_1", time.time() - stime) |
|
def query_config_2(config, n): |
stime = time.time() |
for i in range(0, n): |
z = config['pipetask']['makeWarp']['requestMemory'] |
print("query_config_2", time.time() - stime) |
|
def query_config_3(config, n): |
stime = time.time() |
for i in range(0, n): |
z = config['.pipetask.makeWarp.requestMemory'] |
print("query_config_3", time.time() - stime) |
|
def run_tests(n): |
print(f"===== Testing n = {n}") |
|
print("builtin dict") |
dict_config = {'pipetask': {'makeWarp': {'requestMemory': 2048}}} |
query_config_1(dict_config, n)
|
query_config_2(dict_config, n)
|
|
print("lsst.daf.butler.config") |
config = Config(dict_config) |
query_config_1(config, n)
|
query_config_2(config, n)
|
query_config_3(config, n)
|
|
run_tests(325000 * 10) |
run_tests(325000 * 100) |
I ran it with w_2021_43 to get these timings
./time_config.py
|
===== Testing n = 3250000 |
builtin dict
|
query_config_1 0.20920681953430176 |
query_config_2 0.18520069122314453 |
lsst.daf.butler.config
|
query_config_1 35.91612529754639 |
query_config_2 35.17692303657532 |
query_config_3 10.292534828186035 |
===== Testing n = 32500000 |
builtin dict
|
query_config_1 2.091094493865967 |
query_config_2 1.9995372295379639 |
lsst.daf.butler.config
|
query_config_1 352.6537432670593 |
query_config_2 352.4010479450226 |
query_config_3 103.25376081466675 |
Thanks for the example code. The million isinstance checks are a problem. The Config class is optimized for c["x", "y", "z"] form since that is all done internally whereas c["x"]["y"]["z"] form requires that an entirely new Config object is returned at each level of the hierarchy. Your code seems to think it has a dict so treats it like a dict – it would be faster to force the Config to a dict and then use that if you aren't actually using any Config features after the YAML parse phase. The other problem is that Config supports c["x", "4", "y"] form where the integer can index into a list so it needs to do an instance check for a sequence at every level in the hierarchy. The ["x"]["y"] form can be sped up a little by tweaking how the intermediate Config is created (so far about a third faster). It would also help I think if we had a FrozenConfig that would be read only since FrozenConfig could then cache some lookups. I need to try and profile things a little but for the record for the Config intermediates I go from 40 sec to 25 sec with a better Config constructor and then down to 15 sec if I don't do the sequence check (but we have to do that). It's 11 seconds using the tuple form directly.