# lsst.daf.butler.Config slowness with large numbers of lookups

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
3
• Team:
Architecture
• Urgent?:
No

#### Description

Large submissions are taking hours in the post-QuantumGraph generation steps. After profiling and removing some of the obvious bps issues, I'm currently left with config (or functions called by config, e.g., isinstance) lines in the top 20. BpsConfig inherits from lsst.daf.butler.Config and most of the top items in the profiling are lsst.daf.butler.Config related (because BpsConfig mostly just adds more calls to the lookup, e.g., is it in this section, if not what about this section; lookup and replace all the variables in the string).

This is not a blocker.  I've found some places to cache results to reduce the large numbers of lookups and where easily possible to use the hierarchy lookup (.pipetask.makeWarp.requestMemory).

Just to make sure I could actually see the timing difference, I wrote a tiny program to compare a ton of lookups using builtin dict vs lsst.daf.butler.Config.  (One of the steps in the DC2 reprocessing had around 325,000 Quanta so I used that number as a starting point to fake looking up 10 values each and then 100 values each)

  #!/usr/bin/env python   import time   from lsst.daf.butler import Config   def query_config_1(config, n):  stime = time.time()  for i in range(0, n):  x = config['pipetask']  y = x['makeWarp']  z = y['requestMemory']  print("query_config_1", time.time() - stime)   def query_config_2(config, n):  stime = time.time()  for i in range(0, n):  z = config['pipetask']['makeWarp']['requestMemory']  print("query_config_2", time.time() - stime)   def query_config_3(config, n):  stime = time.time()  for i in range(0, n):  z = config['.pipetask.makeWarp.requestMemory']  print("query_config_3", time.time() - stime)   def run_tests(n):  print(f"===== Testing n = {n}")    print("builtin dict")  dict_config = {'pipetask': {'makeWarp': {'requestMemory': 2048}}}  query_config_1(dict_config, n)  query_config_2(dict_config, n)    print("lsst.daf.butler.config")  config = Config(dict_config)  query_config_1(config, n)  query_config_2(config, n)  query_config_3(config, n)   run_tests(325000 * 10) run_tests(325000 * 100) 

I ran it with w_2021_43 to get these timings

 ./time_config.py  ===== Testing n = 3250000 builtin dict query_config_1 0.20920681953430176 query_config_2 0.18520069122314453 lsst.daf.butler.config query_config_1 35.91612529754639 query_config_2 35.17692303657532 query_config_3 10.292534828186035 ===== Testing n = 32500000 builtin dict query_config_1 2.091094493865967 query_config_2 1.9995372295379639 lsst.daf.butler.config query_config_1 352.6537432670593 query_config_2 352.4010479450226 query_config_3 103.25376081466675 

#### Activity

Hide
Tim Jenness added a comment -

Thanks for the example code. The million isinstance checks are a problem. The Config class is optimized for c["x", "y", "z"] form since that is all done internally whereas c["x"]["y"]["z"] form requires that an entirely new Config object is returned at each level of the hierarchy. Your code seems to think it has a dict so treats it like a dict – it would be faster to force the Config to a dict and then use that if you aren't actually using any Config features after the YAML parse phase. The other problem is that Config supports c["x", "4", "y"] form where the integer can index into a list so it needs to do an instance check for a sequence at every level in the hierarchy. The ["x"]["y"] form can be sped up a little by tweaking how the intermediate Config is created (so far about a third faster). It would also help I think if we had a FrozenConfig that would be read only since FrozenConfig could then cache some lookups. I need to try and profile things a little but for the record for the Config intermediates I go from 40 sec to 25 sec with a better Config constructor and then down to 15 sec if I don't do the sequence check (but we have to do that). It's 11 seconds using the tuple form directly.

Show
Tim Jenness added a comment - Thanks for the example code. The million isinstance checks are a problem. The Config class is optimized for c ["x", "y", "z"] form since that is all done internally whereas c ["x"] ["y"] ["z"] form requires that an entirely new Config object is returned at each level of the hierarchy. Your code seems to think it has a dict so treats it like a dict – it would be faster to force the Config to a dict and then use that if you aren't actually using any Config features after the YAML parse phase. The other problem is that Config supports c ["x", "4", "y"] form where the integer can index into a list so it needs to do an instance check for a sequence at every level in the hierarchy. The ["x"] ["y"] form can be sped up a little by tweaking how the intermediate Config is created (so far about a third faster). It would also help I think if we had a FrozenConfig that would be read only since FrozenConfig could then cache some lookups. I need to try and profile things a little but for the record for the Config intermediates I go from 40 sec to 25 sec with a better Config constructor and then down to 15 sec if I don't do the sequence check (but we have to do that). It's 11 seconds using the tuple form directly.
Hide
Tim Jenness added a comment -

I've made some changes to speed things up a little.

For reference the for the dict-like accessor with these changes my timing improved from 33 seconds to 22 seconds. I can get it down to 5 seconds but only if I break compatibility. The problem is that when you ask for a subset of a Config that is itself a dict, you are returned a new Config that is completely detached from the parent such that if you modify a value in the new Config it won't change anything in the original config – this means that at every level in the hierarchy a whole new deepcopy is done and those really add up. This is fundamentally different from how a normal dict hierarchy works and I'm not sure how we ended up in this situation and whether it was like this in daf_persistence originally. It's clear that if we had a FrozenConfig then this could be made much faster.

I'll send what I have out for review though since I don't really want to go down a Frozen Config rabbit hole on this ticket.
I think though that your best bet is to call toDict() on the config and then access it.

Show
Tim Jenness added a comment - I've made some changes to speed things up a little. For reference the for the dict-like accessor with these changes my timing improved from 33 seconds to 22 seconds. I can get it down to 5 seconds but only if I break compatibility. The problem is that when you ask for a subset of a Config that is itself a dict, you are returned a new Config that is completely detached from the parent such that if you modify a value in the new Config it won't change anything in the original config – this means that at every level in the hierarchy a whole new deepcopy is done and those really add up. This is fundamentally different from how a normal dict hierarchy works and I'm not sure how we ended up in this situation and whether it was like this in daf_persistence originally. It's clear that if we had a FrozenConfig then this could be made much faster. I'll send what I have out for review though since I don't really want to go down a Frozen Config rabbit hole on this ticket. I think though that your best bet is to call toDict() on the config and then access it.
Hide
Michelle Gower added a comment -

I reran the tests in DM-32241 using w44 + this ticket branch.  As you expected it doesn't speed up bps by too much:

• single quantum clustering:  117 -> 109
• transform: 9341 -> 8869
• prepare: 826 -> 757

I'm not quite sure what to make of the comment about bps using Config like a dict.  That's true now, but I don't know for sure it won't use other features in Config in the future (I didn't want to limit it just because.)  bps does use the config[".x.y.z"] notation in some places so that would have to be changed in the code.  I don't know if any users are using that style notation in their config values (presumably not, but we didn't do anything to prevent it).  I've made a comment in and linked this ticket in DM-31245, which is the generic rewrite BpsConfig ticket.  For bps, waiting to see how the value caching works in real use (available last weekly) before doing more work explicitly just for speed.

Show
Michelle Gower added a comment - I reran the tests in DM-32241 using w44 + this ticket branch.  As you expected it doesn't speed up bps by too much: single quantum clustering:  117 -> 109 transform: 9341 -> 8869 prepare: 826 -> 757 I'm not quite sure what to make of the comment about bps using Config like a dict.  That's true now, but I don't know for sure it won't use other features in Config in the future (I didn't want to limit it just because.)  bps does use the config [".x.y.z"] notation in some places so that would have to be changed in the code.  I don't know if any users are using that style notation in their config values (presumably not, but we didn't do anything to prevent it).  I've made a comment in and linked this ticket in DM-31245 , which is the generic rewrite BpsConfig ticket.  For bps, waiting to see how the value caching works in real use (available last weekly) before doing more work explicitly just for speed.
Hide
Michelle Gower added a comment -

A couple comments about comments and one code question in the review. I did a pipelines_check submission with bps just to double check and it ran fine.   Changes approved for merging.

Show
Michelle Gower added a comment - A couple comments about comments and one code question in the review. I did a pipelines_check submission with bps just to double check and it ran fine.   Changes approved for merging.

#### People

Assignee:
Tim Jenness
Reporter:
Michelle Gower
Reviewers:
Michelle Gower
Watchers:
Michelle Gower, Mikolaj Kowalik, Tim Jenness