Status: Won't Fix
Fix Version/s: None
This is a child of
DM-22178, providing a quick fix to remove replicated records from the output of queryDatasets
Thanks for the update Jim Bosch. In addition to rejecting duplicates, in an attempt to identify the cause, I've tracked the duplicates to result from the `execute` method of `Query` class. So this issue might as affect well beyond DatasetRef, perhaps?
Yes, definitely; in particular, I'm pretty sure it also affects queryDataIds. The root problem goes further back - it's in the definition of the Query before we call execute (which pretty much just blindly executes the SQL we've given it). That code is very complex, and I don't recommend trying to trace the problem all the way back through that, at least not directly. I think the way to debug it would be to see what kinds of query options (and database content) do and don't yield duplicates, but that's still a pretty big phase space to explore, and it's not something I'd recommend as part of what's supposed to be a quick ticket.
DM-24938 won't fix this automatically, but it clarifies that fixing it directly is probably too expensive in general to make the default, and it provides an easy way to get a unique version of the results if desired (queryDatasets(...).subset(unique=True)).
DM-21448is now in review; that adds hashability to DatasetRef, and hence may make this ticket even easier.