# Inconsistent hash for DimensionSet/DimensionNameSet

XMLWordPrintable

## Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
1
• Sprint:
BG3_F18_11
• Team:
Data Access and Database

## Description

It looks like recent switch from units to dimensions created issuers in GraphBuilder, I'm getting error when trying to run simple pipeline of two tasks :

 Traceback (most recent call last):  File "/project/salnikov/gen3-middleware/pipe_supertask/bin/stac", line 25, in   sys.exit(CmdLineFwk().parseAndRun())  File "/project/salnikov/gen3-middleware/pipe_supertask/python/lsst/pipe/supertask/cmdLineFwk.py", line 198, in parseAndRun  qgraph = graphBuilder.makeGraph(pipeline, coll, args.data_query)  File "/project/salnikov/gen3-middleware/pipe_supertask/python/lsst/pipe/supertask/graphBuilder.py", line 198, in makeGraph  originInfo, userQuery)  File "/project/salnikov/gen3-middleware/pipe_supertask/python/lsst/pipe/supertask/graphBuilder.py", line 324, in _makeGraph  dataRef = row.datasetRefs[dsType] KeyError: DatasetType(calexp, {Visit, Instrument, Detector}, ExposureF) 

Looks like the dict indexing on DatasetType is broken, and this seems to be related to hash returning different values when dimension is either DimensionGraph or DimensionNameSet.

## Activity

Hide
Andy Salnikov added a comment -

I think that DimensionSetBase hash method is not quite correct:

  def __hash__(self):  # Two possibilities in practice:  try:  # self.names is a frozenset, which is hashable but not sorted.  return hash(self.names)  except TypeError:  # self.names is an OrderedDict keys view, which is not hashable  # but is sorted.  return hash(tuple(self.names)) 

for the same set of names it can return different hash.

For proper hashing we need to use the same container type , it's probably easiest to just make frozenset out of names.

Show
Andy Salnikov added a comment - I think that DimensionSetBase hash method is not quite correct: def __hash__( self ): # Two possibilities in practice: try : # self.names is a frozenset, which is hashable but not sorted. return hash ( self .names) except TypeError: # self.names is an OrderedDict keys view, which is not hashable # but is sorted. return hash ( tuple ( self .names)) for the same set of names it can return different hash. For proper hashing we need to use the same container type , it's probably easiest to just make frozenset out of names.
Hide
Andy Salnikov added a comment - - edited

JIm, could you check that my fix makes sense? PR: https://github.com/lsst/daf_butler/pull/111

Show
Andy Salnikov added a comment - - edited JIm, could you check that my fix makes sense? PR: https://github.com/lsst/daf_butler/pull/111
Hide
Jim Bosch added a comment - - edited

Fix looks good, thanks!

We should probably also try to make sure the activator logic "upgrades" its DatasetTypes to have full DimensionGraphs as soon as it can (this is done automatically by Registry.getDatasetType and Registry.registerDatasetType), and maybe consider making DimensionGraph __hash__ on frozenset(self.names), too.  I am a bit worried that while the ability to compare DimensionNameSet, DimensionSet, and DimensionGraph for equality is very nice, it is very hard to those binary operations consistent with the unary hash operation, and even this ticket doesn't fully achieve that.

Show
Jim Bosch added a comment - - edited Fix looks good, thanks! We should probably also try to make sure the activator logic "upgrades" its DatasetTypes to have full DimensionGraphs as soon as it can (this is done automatically by Registry.getDatasetType and Registry.registerDatasetType), and maybe consider making DimensionGraph __hash__ on frozenset(self.names) , too.  I am a bit worried that while the ability to compare DimensionNameSet, DimensionSet, and DimensionGraph for equality is very nice, it is very hard to those binary operations consistent with the unary hash operation, and even this ticket doesn't fully achieve that.
Hide
Andy Salnikov added a comment -

Thanks for review, indeed we probably need another review it once again, this is just enough to unblock my other tickets.

Show
Andy Salnikov added a comment - Thanks for review, indeed we probably need another review it once again, this is just enough to unblock my other tickets.

## People

• Assignee:
Andy Salnikov
Reporter:
Andy Salnikov
Reviewers:
Jim Bosch
Watchers:
Andy Salnikov, Jim Bosch