Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-19851

Improve multi-collection query in QuantumGraph generator

    XMLWordPrintable

    Details

      Description

      Christopher Stephens [X] suggested that there is a better way to write a query that selects datasets when there are multiple collections defined. Presently we do a JOIN of two sub-queries where one of them does a GROUP BY to determine "best" collection for the dataset. The same can be rewritten using the window/anlytics functions instead.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Jim Bosch, if things work OK now on this branch I presume it should be OK to merge it now?

            Show
            salnikov Andy Salnikov added a comment - Jim Bosch , if things work OK now on this branch I presume it should be OK to merge it now?
            Hide
            jbosch Jim Bosch added a comment -

            Actually, it might be best to hold up a bit - I'm just now seeing some evidence that this might be an overall performance regression rather than an improvement, but at least some of the results are contradictory.  More tests ongoing.

            Show
            jbosch Jim Bosch added a comment - Actually, it might be best to hold up a bit - I'm just now seeing some evidence that this might be an overall performance regression rather than an improvement, but at least some of the results are contradictory.  More tests ongoing.
            Hide
            jbosch Jim Bosch added a comment -

            So, even with new indexes, this is holding steady at about 5min slower (on a single-tract query) than without the changes on this branch.  I think the previous benchmarks here must have been user error, because it's been pretty consistent since: ~25 minutes without these changes, ~29 minutes with.

            I think we should wait for Christopher Stephens [X] to get back and have a chance to analyze further before giving up on it, though; there may be some small change that would push it out in front.

            One quick thought on this query: if I were writing out the algorithm I wanted for this subquery in an imperative programming language (considering the fact that it's always joined to other tables on (instrument, exposure, detector) - I think that's got to be important in the right query plan here), I'd want to step through the ordered list of collections and short circuit as soon as I found a match, instead of trying all of them and then ranking them.  Is that actually what (either form of) this query is doing?  If not, is that possible?

            Show
            jbosch Jim Bosch added a comment - So, even with new indexes, this is holding steady at about 5min slower (on a single-tract query) than without the changes on this branch.  I think the previous benchmarks here must have been user error, because it's been pretty consistent since: ~25 minutes without these changes, ~29 minutes with. I think we should wait for Christopher Stephens [X] to get back and have a chance to analyze further before giving up on it, though; there may be some small change that would push it out in front. One quick thought on this query: if I were writing out the algorithm I wanted for this subquery in an imperative programming language (considering the fact that it's always joined to other tables on (instrument, exposure, detector) - I think that's got to be important in the right query plan here), I'd want to step through the ordered list of collections and short circuit as soon as I found a match, instead of trying all of them and then ranking them.  Is that actually what (either form of) this query is doing?  If not, is that possible?
            Hide
            salnikov Andy Salnikov added a comment -

            I'm not sure how it can be slower, it is supposed to be faster (basically GROUP BY should need two passes and window function a single pass). Chris should probably be able to figure it out from optimizer plan.
            If collections were ordered in database it could work better I guess (basically doing what you describe), the trouble in our case is that order of collections is undefined (not fixed), can be different in different queries so it probably has to go through all of them and "re-order" every time. Again, there may be something else entirely, we do need Chris' help.

            Show
            salnikov Andy Salnikov added a comment - I'm not sure how it can be slower, it is supposed to be faster (basically GROUP BY should need two passes and window function a single pass). Chris should probably be able to figure it out from optimizer plan. If collections were ordered in database it could work better I guess (basically doing what you describe), the trouble in our case is that order of collections is undefined (not fixed), can be different in different queries so it probably has to go through all of them and "re-order" every time. Again, there may be something else entirely, we do need Chris' help.
            Hide
            jbosch Jim Bosch added a comment -

            DM-17023 has made the ideas being tried here mostly obsolete and the branches totally unmergeable, though it's probably worth keeping the branches in case we decide we want some bits from them later.

            Show
            jbosch Jim Bosch added a comment - DM-17023 has made the ideas being tried here mostly obsolete and the branches totally unmergeable, though it's probably worth keeping the branches in case we decide we want some bits from them later.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Christopher Stephens [X] (Inactive), Jim Bosch
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.