Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:1
-
Epic Link:
-
Team:Data Release Production
-
Urgent?:No
Description
I just had BPS job die with the following traceback:
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/managers.py", line 309, in refresh
|
self.collections.refresh()
|
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/collections/_base.py", line 361, in refresh
|
chain.refresh(self)
|
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/interfaces/_collections.py", line 204, in refresh
|
self._children = self._load(manager)
|
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/collections/_base.py", line 284, in _load
|
[manager[row[self._table.columns.child]].name for row in self._db.query(sql)]
|
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/collections/_base.py", line 284, in <listcomp>
|
[manager[row[self._table.columns.child]].name for row in self._db.query(sql)]
|
File "/software/lsstsw/stack_20210415/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-74-g62d1151e+6c308c38c1/python/lsst/daf/butler/registry/collections/_base.py", line 424, in __getitem__
|
raise MissingCollectionError(f"Collection with key '{key}' not found.") from err
|
lsst.daf.butler.registry._exceptions.MissingCollectionError: Collection with key 'u/kbechtol/calib_test/20210427T062718Z' not found.
|
I'm 90% sure that Keith Bechtol happened to delete this completely-unrelated-to-me collection last night (please confirm if you can, Keith) while my Butler was linking up all of its parent and child collection information at startup.
I've always known this super-aggressive up-front fetching wouldn't scale to many, many users, but I hadn't anticipated having to replace it so soon (it'd be nice to get e.g. DM-29585 first). We might be able work around this in the short term by either wrapping this in a transaction or just catching the exception and retrying.
Andy Salnikov, I've added you as a watcher because I figure it's possible you may be able to get to this before I do (I know we're both super busy, so neither of us is likely to), or you may have some thoughts on the best approach to take on this. And Tim Jenness, this is the kind of thing that might get exacerbated in the client/server butler, depending on where we do the caching.
Attachments
Issue Links
- relates to
-
DM-29973 Gen3 RC2 processing with w_2021_18 stack
- Done
I am sorry if I inadvertently disrupted a job. Here are the commands I was running. I believe this was at roughly 12:30pm Project time (PDT) 26 April 2021.
import lsst.daf.butler as dafButler
config = '/repo/main/butler.yaml'
butler = dafButler.Butler(config=config, collections='HSC/runs/RC2/w_2021_14/
DM-29528', writeable=True)registry = butler.registry
registry.removeDatasetType('metricvalue_info_nsrcMeasDetector')
registry.removeDatasetType('nsrcMeasDetector_metadata')
I think I also attempted to run these commands at ~10 pm Project time (PDT) 26 April 2021.