Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22286

Remove duplicates from the output of queryDatasets

    Details

    • Type: Story
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      1
    • Epic Link:
    • Sprint:
      DRP F19-6 (Nov)
    • Team:
      Data Release Production

      Description

      This is a child of DM-22178, providing a quick fix to remove replicated records from the output of queryDatasets

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            DM-21448 is now in review; that adds hashability to DatasetRef, and hence may make this ticket even easier.

            Show
            jbosch Jim Bosch added a comment - DM-21448 is now in review; that adds hashability to DatasetRef, and hence may make this ticket even easier.
            Hide
            kannawad Arun Kannawadi added a comment -

            Thanks for the update Jim Bosch. In addition to rejecting duplicates, in an attempt to identify the cause, I've tracked the duplicates to result from the `execute` method of `Query` class. So this issue might as affect well beyond DatasetRef, perhaps?

            Show
            kannawad Arun Kannawadi added a comment - Thanks for the update Jim Bosch . In addition to rejecting duplicates, in an attempt to identify the cause, I've tracked the duplicates to result from the `execute` method of `Query` class. So this issue might as affect well beyond DatasetRef, perhaps?
            Hide
            jbosch Jim Bosch added a comment -

            Yes, definitely; in particular, I'm pretty sure it also affects queryDataIds.  The root problem goes further back - it's in the definition of the Query before we call execute (which pretty much just blindly executes the SQL we've given it).  That code is very complex, and I don't recommend trying to trace the problem all the way back through that, at least not directly.  I think the way to debug it would be to see what kinds of query options (and database content) do and don't yield duplicates, but that's still a pretty big phase space to explore, and it's not something I'd recommend as part of what's supposed to be a quick ticket.

            Show
            jbosch Jim Bosch added a comment - Yes, definitely; in particular, I'm pretty sure it also affects queryDataIds.  The root problem goes further back - it's in the definition of the Query before we call execute (which pretty much just blindly executes the SQL we've given it).  That code is very complex, and I don't recommend trying to trace the problem all the way back through that, at least not directly.  I think the way to debug it would be to see what kinds of query options (and database content) do and don't yield duplicates, but that's still a pretty big phase space to explore, and it's not something I'd recommend as part of what's supposed to be a quick ticket.
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch has this now been fixed by other work?

            Show
            tjenness Tim Jenness added a comment - Jim Bosch has this now been fixed by other work?
            Hide
            jbosch Jim Bosch added a comment -

            DM-24938 won't fix this automatically, but it clarifies that fixing it directly is probably too expensive in general to make the default, and it provides an easy way to get a unique version of the results if desired (queryDatasets(...).subset(unique=True)).

            Show
            jbosch Jim Bosch added a comment - DM-24938 won't fix this automatically, but it clarifies that fixing it directly is probably too expensive in general to make the default, and it provides an easy way to get a unique version of the results if desired ( queryDatasets(...).subset(unique=True) ).

              People

              • Assignee:
                kannawad Arun Kannawadi
                Reporter:
                kannawad Arun Kannawadi
                Watchers:
                Arun Kannawadi, Jim Bosch, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel