Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_persistence, pipe_tasks
-
Labels:None
-
Story Points:15
-
Epic Link:
-
Sprint:DB_S17_5, DB_S17_6
-
Team:Data Access and Database
Description
When a user runs processCcd from the command line for ISR, if outputrepo was previously a destination for another processCcd run, an opaque butler error which sneakily originates in daf_persistence will result ("No locations for get").
I have only tried this for DECam images; it's possible this may be a special case affecting only images ingested with obs_decam. I don't generally have a problem with the requirement for an empty destination directory for output from processCcd, but the Traceback should give the user a clue as to the source of the problem.
For example, for the case where repo, calibrepo, and outputrepo are directories with ingested images, ingested calibration products, and past output from processCcd, respectively:
processCcd.py repo --id visit=410878 ccdnum=54 --output outputrepo --calib calibrepo -C $OBS_DECAM_DIR/config/processCcdCpIsr.py --config calibrate.doAstrometry=False calibrate.doPhotoCal=False
|
will return an error that appears to indicate a problem with accessing the bias:
processCcd INFO: Processing {u'hdu': 55, 'object': 'Blind15A_39', 'visit': 410878, u'filter': 'r', 'ccdnum': 54, u'date': '2015-02-17'}
|
processCcd.isr INFO: Performing ISR on sensor {u'hdu': 55, 'object': 'Blind15A_39', 'visit': 410878, u'filter': 'r', 'ccdnum': 54, u'date': '2015-02-17'}
|
makeWcs WARN: Stripping PVi_j keys from projection RA---TPV/DEC--TPV
|
processCcd FATAL: Failed on dataId={u'hdu': 55, 'object': 'Blind15A_39', 'visit': 410878, u'filter': 'r', 'ccdnum': 54, u'date': '2015-02-17'}: Unable to retrieve cpBias for {u'hdu': 55, 'object': 'Blind15A_39', 'visit': 410878, u'filter': 'r', 'ccdnum': 54, u'date': '2015-02-17'}: No locations for get: datasetType:cpBias dataId:DataId(initialdata={u'filter': 'r', u'hdu': 55, 'ccdnum': 54, u'date': '2015-02-17', 'object': 'Blind15A_39', 'visit': 410878}, tag=set([]))
|
Traceback (most recent call last):
|
File "/ssd/lsstsw/stack_20170409/Linux64/pipe_base/13.0-3-g7fa07e0/python/lsst/pipe/base/cmdLineTask.py", line 347, in __call__
|
result = task.run(dataRef, **kwargs)
|
File "/ssd/lsstsw/stack_20170409/Linux64/pipe_base/13.0-3-g7fa07e0/python/lsst/pipe/base/timer.py", line 121, in wrapper
|
res = func(self, *args, **keyArgs)
|
File "/home/mrawls/pipe_tasks/python/lsst/pipe/tasks/processCcd.py", line 183, in run
|
exposure = self.isr.runDataRef(sensorRef).exposure
|
File "/ssd/lsstsw/stack_20170409/Linux64/pipe_base/13.0-3-g7fa07e0/python/lsst/pipe/base/timer.py", line 121, in wrapper
|
res = func(self, *args, **keyArgs)
|
File "/ssd/lsstsw/stack_20170409/Linux64/ip_isr/13.0-9-g733b75c+2/python/lsst/ip/isr/isrTask.py", line 484, in runDataRef
|
isrData = self.readIsrData(sensorRef, ccdExposure)
|
File "/home/mrawls/obs_decam/python/lsst/obs/decam/decamCpIsr.py", line 77, in readIsrData
|
biasExposure = self.getIsrExposure(dataRef, "cpBias") if self.config.doBias else None
|
File "/ssd/lsstsw/stack_20170409/Linux64/ip_isr/13.0-9-g733b75c+2/python/lsst/ip/isr/isrTask.py", line 586, in getIsrExposure
|
raise RuntimeError("Unable to retrieve %s for %s: %s" % (datasetType, dataRef.dataId, exc1))
|
RuntimeError: Unable to retrieve cpBias for {u'hdu': 55, 'object': 'Blind15A_39', 'visit': 410878, u'filter': 'r', 'ccdnum': 54, u'date': '2015-02-17'}: No locations for get: datasetType:cpBias dataId:DataId(initialdata={u'filter': 'r', u'hdu': 55, 'ccdnum': 54, u'date': '2015-02-17', 'object': 'Blind15A_39', 'visit': 410878}, tag=set([]))
|
but the same command runs successfully when outputrepo is an empty directory.
The redesign of the repository initialization as described in the docstring is reasonable except that I believe it still breaks the common pattern of creating an output repo with one task (e.g. generating a skymap for a coadd) and then reusing that output repo while adding more output datasets (e.g. generating warped images) with different input repos. In order to handle this case, we need to allow addition of new input repos (parents) to an existing output repo. This requires that the in-repo persisted configuration be updated, which in turn requires locking to serialize access to that configuration. This needs to be added before merge to master to ensure that this use case is not broken.
(The use case can also be supported by insisting that repos not be reused; the skymap repo would then be an input/parent to the warp repo. This requires changes to both existing tasks and user habits which will take longer.)
While the multi-step initialization process seems complex, each step is necessary (at least for now).
There are a few typos, misspellings, and coding issues that should be fixed as well. For example, in line 302 of butler.py, the extra parentheses should be removed and the use of the external-to-the-function variable alreadyAdded should be documented. A separate Python-only code review is recommended.