Status: Won't Fix
Fix Version/s: None
This is a child of
DM-22178, providing a quick fix to remove replicated records from the output of queryDatasets
|Field||Original Value||New Value|
|Team||Data Release Production [ 10301 ]|
|Assignee||Arun Kannawadi [ kannawad ]|
Thanks for the update Jim Bosch. In addition to rejecting duplicates, in an attempt to identify the cause, I've tracked the duplicates to result from the `execute` method of `Query` class. So this issue might as affect well beyond DatasetRef, perhaps?
Yes, definitely; in particular, I'm pretty sure it also affects queryDataIds. The root problem goes further back - it's in the definition of the Query before we call execute (which pretty much just blindly executes the SQL we've given it). That code is very complex, and I don't recommend trying to trace the problem all the way back through that, at least not directly. I think the way to debug it would be to see what kinds of query options (and database content) do and don't yield duplicates, but that's still a pretty big phase space to explore, and it's not something I'd recommend as part of what's supposed to be a quick ticket.
DM-24938 won't fix this automatically, but it clarifies that fixing it directly is probably too expensive in general to make the default, and it provides an easy way to get a unique version of the results if desired (queryDatasets(...).subset(unique=True)).
|Resolution||Done [ 10000 ]|
|Status||To Do [ 10001 ]||Won't Fix [ 10405 ]|
DM-21448is now in review; that adds hashability to DatasetRef, and hence may make this ticket even easier.