Details
-
Type:
Story
-
Status: Won't Fix
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:2
-
Team:Data Release Production
Description
Running a query like
list(butler.registry.queryDatasets("calexp", collections=["shared/ci_hsc_output"],
|
skymap="discrete/ci_hsc", tract=0, patch=70))
|
produces many duplicate records for unclear reasons (they're not from having multiple collections, so deduplicate doesn't help). Try to fix this, or at least document when duplicate results must be expected.
Since we know that all the DatasetRefs being returned must be coming from the same butler repository, this means that the ref.id must be unique. A short term fix is therefore to deduplicate the results by checking for duplicated ref.id – that's a couple of lines of code using a dict.
Is it worth doing this quickly as a separate ticket and reserving this ticket for understanding why the query itself is producing duplicates?