Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32305

lsst.daf.butler.Config slowness with large numbers of lookups

    XMLWordPrintable

    Details

    • Story Points:
      3
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Large submissions are taking hours in the post-QuantumGraph generation steps. After profiling and removing some of the obvious bps issues, I'm currently left with config (or functions called by config, e.g., isinstance) lines in the top 20. BpsConfig inherits from lsst.daf.butler.Config and most of the top items in the profiling are lsst.daf.butler.Config related (because BpsConfig mostly just adds more calls to the lookup, e.g., is it in this section, if not what about this section; lookup and replace all the variables in the string).

      This is not a blocker.  I've found some places to cache results to reduce the large numbers of lookups and where easily possible to use the hierarchy lookup (.pipetask.makeWarp.requestMemory).  

       

      Just to make sure I could actually see the timing difference, I wrote a tiny program to compare a ton of lookups using builtin dict vs lsst.daf.butler.Config.  (One of the steps in the DC2 reprocessing had around 325,000 Quanta so I used that number as a starting point to fake looking up 10 values each and then 100 values each)

       #!/usr/bin/env python
       
      import time
       
      from lsst.daf.butler import Config
       
      def query_config_1(config, n):
          stime = time.time()
          for i in range(0, n):
              x = config['pipetask']
              y = x['makeWarp']
              z = y['requestMemory']
          print("query_config_1", time.time() - stime)
       
      def query_config_2(config, n):
          stime = time.time()
          for i in range(0, n):
              z = config['pipetask']['makeWarp']['requestMemory']
          print("query_config_2", time.time() - stime)
       
      def query_config_3(config, n):
          stime = time.time()
          for i in range(0, n):
              z = config['.pipetask.makeWarp.requestMemory']
          print("query_config_3", time.time() - stime)
       
      def run_tests(n):
          print(f"===== Testing n = {n}")
       
          print("builtin dict")
          dict_config = {'pipetask': {'makeWarp': {'requestMemory': 2048}}}
          query_config_1(dict_config, n)
          query_config_2(dict_config, n)
       
          print("lsst.daf.butler.config")
          config = Config(dict_config)
          query_config_1(config, n)
          query_config_2(config, n)
          query_config_3(config, n)
       
      run_tests(325000 * 10)
      run_tests(325000 * 100)
      

      I ran it with w_2021_43 to get these timings

      ./time_config.py 
      ===== Testing n = 3250000
      builtin dict
      query_config_1 0.20920681953430176
      query_config_2 0.18520069122314453
      lsst.daf.butler.config
      query_config_1 35.91612529754639
      query_config_2 35.17692303657532
      query_config_3 10.292534828186035
      ===== Testing n = 32500000
      builtin dict
      query_config_1 2.091094493865967
      query_config_2 1.9995372295379639
      lsst.daf.butler.config
      query_config_1 352.6537432670593
      query_config_2 352.4010479450226
      query_config_3 103.25376081466675
      

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Thanks for the example code. The million isinstance checks are a problem. The Config class is optimized for c["x", "y", "z"] form since that is all done internally whereas c["x"]["y"]["z"] form requires that an entirely new Config object is returned at each level of the hierarchy. Your code seems to think it has a dict so treats it like a dict – it would be faster to force the Config to a dict and then use that if you aren't actually using any Config features after the YAML parse phase. The other problem is that Config supports c["x", "4", "y"] form where the integer can index into a list so it needs to do an instance check for a sequence at every level in the hierarchy. The ["x"]["y"] form can be sped up a little by tweaking how the intermediate Config is created (so far about a third faster). It would also help I think if we had a FrozenConfig that would be read only since FrozenConfig could then cache some lookups. I need to try and profile things a little but for the record for the Config intermediates I go from 40 sec to 25 sec with a better Config constructor and then down to 15 sec if I don't do the sequence check (but we have to do that). It's 11 seconds using the tuple form directly.

            Show
            tjenness Tim Jenness added a comment - Thanks for the example code. The million isinstance checks are a problem. The Config class is optimized for c ["x", "y", "z"] form since that is all done internally whereas c ["x"] ["y"] ["z"] form requires that an entirely new Config object is returned at each level of the hierarchy. Your code seems to think it has a dict so treats it like a dict – it would be faster to force the Config to a dict and then use that if you aren't actually using any Config features after the YAML parse phase. The other problem is that Config supports c ["x", "4", "y"] form where the integer can index into a list so it needs to do an instance check for a sequence at every level in the hierarchy. The ["x"] ["y"] form can be sped up a little by tweaking how the intermediate Config is created (so far about a third faster). It would also help I think if we had a FrozenConfig that would be read only since FrozenConfig could then cache some lookups. I need to try and profile things a little but for the record for the Config intermediates I go from 40 sec to 25 sec with a better Config constructor and then down to 15 sec if I don't do the sequence check (but we have to do that). It's 11 seconds using the tuple form directly.
            Hide
            tjenness Tim Jenness added a comment -

            I've made some changes to speed things up a little.

            For reference the for the dict-like accessor with these changes my timing improved from 33 seconds to 22 seconds. I can get it down to 5 seconds but only if I break compatibility. The problem is that when you ask for a subset of a Config that is itself a dict, you are returned a new Config that is completely detached from the parent such that if you modify a value in the new Config it won't change anything in the original config – this means that at every level in the hierarchy a whole new deepcopy is done and those really add up. This is fundamentally different from how a normal dict hierarchy works and I'm not sure how we ended up in this situation and whether it was like this in daf_persistence originally. It's clear that if we had a FrozenConfig then this could be made much faster.

            I'll send what I have out for review though since I don't really want to go down a Frozen Config rabbit hole on this ticket.
            I think though that your best bet is to call `toDict()` on the config and then access it.

            Show
            tjenness Tim Jenness added a comment - I've made some changes to speed things up a little. For reference the for the dict-like accessor with these changes my timing improved from 33 seconds to 22 seconds. I can get it down to 5 seconds but only if I break compatibility. The problem is that when you ask for a subset of a Config that is itself a dict, you are returned a new Config that is completely detached from the parent such that if you modify a value in the new Config it won't change anything in the original config – this means that at every level in the hierarchy a whole new deepcopy is done and those really add up. This is fundamentally different from how a normal dict hierarchy works and I'm not sure how we ended up in this situation and whether it was like this in daf_persistence originally. It's clear that if we had a FrozenConfig then this could be made much faster. I'll send what I have out for review though since I don't really want to go down a Frozen Config rabbit hole on this ticket. I think though that your best bet is to call `toDict()` on the config and then access it.
            Hide
            mgower Michelle Gower added a comment -

            I reran the tests in DM-32241 using w44 + this ticket branch.  As you expected it doesn't speed up bps by too much:

            • single quantum clustering:  117 -> 109
            • transform: 9341 -> 8869
            • prepare: 826 -> 757

            I'm not quite sure what to make of the comment about bps using Config like a dict.  That's true now, but I don't know for sure it won't use other features in Config in the future (I didn't want to limit it just because.)  bps does use the config[".x.y.z"] notation in some places so that would have to be changed in the code.  I don't know if any users are using that style notation in their config values (presumably not, but we didn't do anything to prevent it).  I've made a comment in and linked this ticket in DM-31245, which is the generic rewrite BpsConfig ticket.  For bps, waiting to see how the value caching works in real use (available last weekly) before doing more work explicitly just for speed.

            Show
            mgower Michelle Gower added a comment - I reran the tests in DM-32241 using w44 + this ticket branch.  As you expected it doesn't speed up bps by too much: single quantum clustering:  117 -> 109 transform: 9341 -> 8869 prepare: 826 -> 757 I'm not quite sure what to make of the comment about bps using Config like a dict.  That's true now, but I don't know for sure it won't use other features in Config in the future (I didn't want to limit it just because.)  bps does use the config [".x.y.z"] notation in some places so that would have to be changed in the code.  I don't know if any users are using that style notation in their config values (presumably not, but we didn't do anything to prevent it).  I've made a comment in and linked this ticket in DM-31245 , which is the generic rewrite BpsConfig ticket.  For bps, waiting to see how the value caching works in real use (available last weekly) before doing more work explicitly just for speed.
            Hide
            mgower Michelle Gower added a comment -

            A couple comments about comments and one code question in the review. I did a pipelines_check submission with bps just to double check and it ran fine.   Changes approved for merging.

            Show
            mgower Michelle Gower added a comment - A couple comments about comments and one code question in the review. I did a pipelines_check submission with bps just to double check and it ran fine.   Changes approved for merging.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              mgower Michelle Gower
              Reviewers:
              Michelle Gower
              Watchers:
              Michelle Gower, Mikolaj Kowalik, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.