XMLWordPrintable

#### Details

• Type: Story
• Status: Invalid
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Team:
Architecture

#### Description

When I run scons in ci_hsc, it fails as follows:

 partial([".scons/sfm-903334-16"], ["DATA/registry.sqlite3", "DATA/CALIB", ".scons/sfm"]) : Validating dataset processCcd_config for {'ccd': 16, 'visit': 903334} CameraMapper: Loading registry registry from /raid/swinbank/src/ci_hsc/DATA/registry.sqlite3 CameraMapper: Loading calibRegistry registry from /raid/swinbank/src/ci_hsc/DATA/CALIB/calibRegistry.sqlite3 : processCcd_config exists: PASS : processCcd_config readable (): PASS : Validating dataset processCcd_metadata for {'ccd': 16, 'visit': 903334} : processCcd_metadata exists: PASS scons: *** [.scons/sfm-903334-16] Exception : input stream error Traceback (most recent call last):  File "/nfs/home/lsstsw/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Action.py", line 1062, in execute  result = self.execfunction(target=target, source=rsources, env=env)  File "/raid/swinbank/src/ci_hsc/python/lsst/ci/hsc/validate.py", line 116, in scons  return self.run(*args, **kwargs)  File "/raid/swinbank/src/ci_hsc/python/lsst/ci/hsc/validate.py", line 92, in run  self.validateDataset(dataId, ds)  File "/raid/swinbank/src/ci_hsc/python/lsst/ci/hsc/validate.py", line 67, in validateDataset  print(data.__class__)  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/readProxy.py", line 41, in __getattribute__  subject = oga(self, '__subject__')  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/readProxy.py", line 136, in __subject__  set_cache(self, get_callback(self)())  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/butler.py", line 279, in   callback = lambda: self._read(pythonType, location)  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/butler.py", line 452, in _read  location.getCppType(), storageList, additionalData)  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/persistenceLib.py", line 1430, in unsafeRetrieve  return _persistenceLib.Persistence_unsafeRetrieve(self, *args) Exception: input stream error scons: building terminated because of errors. 

#### Attachments

1. 0903334-016.boost
56 kB
2. 0903334-016.xml
129 kB
3. testPersist.py
0.6 kB

#### Activity

Hide
John Swinbank added a comment -

The metadata does appear to have been written to a plausible location:

 $ls -l ./DATA/00533/HSC-R/processCcd_metadata/0903334-016.boost -rw-rw-r-- 1 swinbank lsst 57146 Jan 27 12:01 ./DATA/00533/HSC-R/processCcd_metadata/0903334-016.boost  But is not retrievable with the Butler directly:  In [1]: import lsst.daf.persistence as dafPersist   In [2]: b = dafPersist.Butler("./DATA") CameraMapper: Loading registry registry from ./DATA/registry.sqlite3 CameraMapper: Loading calibRegistry registry from ./DATA/CALIB/calibRegistry.sqlite3   In [3]: dataId = {'ccd': 16, 'visit': 903334}   In [4]: b.get("processCcd_metadata", dataId) [...]  1432 def getPersistence(*args):   Exception: input stream error  Show John Swinbank added a comment - The metadata does appear to have been written to a plausible location:$ ls -l ./DATA/00533/HSC-R/processCcd_metadata/0903334-016.boost -rw-rw-r-- 1 swinbank lsst 57146 Jan 27 12:01 ./DATA/00533/HSC-R/processCcd_metadata/0903334-016.boost But is not retrievable with the Butler directly: In [1]: import lsst.daf.persistence as dafPersist   In [2]: b = dafPersist.Butler("./DATA") CameraMapper: Loading registry registry from ./DATA/registry.sqlite3 CameraMapper: Loading calibRegistry registry from ./DATA/CALIB/calibRegistry.sqlite3   In [3]: dataId = {'ccd': 16, 'visit': 903334}   In [4]: b.get("processCcd_metadata", dataId) [...] 1432 def getPersistence(*args):   Exception: input stream error
Hide
John Swinbank added a comment -

Paul Price – Is this something you recognize? I'm happy to spend a bit longer poking around, but if you happen to know a quick fix that would be great.

Show
John Swinbank added a comment - Paul Price – Is this something you recognize? I'm happy to spend a bit longer poking around, but if you happen to know a quick fix that would be great.
Hide
Paul Price added a comment - - edited

It doesn't look familiar. This all worked for me with the LSST stack when I delivered it.

Some ideas:

• Check that the filename provided by butler.get("processCcd_metadata_filename", dataId")[0] exists and is what you expect.
• Are you trying to read as a PropertyList when written as PropertySet, or similar (check the type in HscMapper.paf corresponds to what's in lsst.pipe.base.Task).

I suggest filing a ticket asking for support for reading these boost files directly (without the butler or several lines of daf_persistence setup).

Show
Paul Price added a comment - - edited It doesn't look familiar. This all worked for me with the LSST stack when I delivered it. Some ideas: Check that the filename provided by butler.get("processCcd_metadata_filename", dataId") [0] exists and is what you expect. Try butler.get("processCcd_metadata", dataId, immediate=True) . Are you trying to read as a PropertyList when written as PropertySet , or similar (check the type in HscMapper.paf corresponds to what's in lsst.pipe.base.Task ). I suggest filing a ticket asking for support for reading these boost files directly (without the butler or several lines of daf_persistence setup).
Hide
John Swinbank added a comment -

Thanks Paul.

This works with no problem on my (Mac) laptop using stack w_2016_03. I copied the 0903334-016.boost file from my laptop and dropped it into the Butler directory on lsst-dev: exactly the same error as above. Copied the file generated on lsst-dev to my laptop and dropped it into the repository there: loads fine. Intriguing.

Show
John Swinbank added a comment - Thanks Paul. This works with no problem on my (Mac) laptop using stack w_2016_03 . I copied the 0903334-016.boost file from my laptop and dropped it into the Butler directory on lsst-dev : exactly the same error as above. Copied the file generated on lsst-dev to my laptop and dropped it into the repository there: loads fine. Intriguing.
Hide
John Swinbank added a comment -

For the record, I also tried all Paul's helpful suggestions, but am still none the wiser – all looks fine.

Show
John Swinbank added a comment - For the record, I also tried all Paul's helpful suggestions, but am still none the wiser – all looks fine.
Hide
Kian-Tat Lim added a comment -

Could you please attach the .boost file? Do the versions on your laptop and lsst-dev compare identical?

Show
Kian-Tat Lim added a comment - Could you please attach the .boost file? Do the versions on your laptop and lsst-dev compare identical?
Hide
John Swinbank added a comment - - edited

0903334-016.boost and simple test script I wrote attached.

Example of running on different machines:

 [jds@magpie ~]$uname -a Darwin magpie 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,3 Darwin [jds@magpie ~]$ eups list -s lsst_apps  11.0+76 current w_2016_03 w_latest b1856 setup [jds@magpie ~]$md5sum testPersist.py 0903334-016.boost 1fc9f21c54acd674990dadab2addb4b7 testPersist.py 5d8713f610f1e51d9ba08a6a8609207e 0903334-016.boost [jds@magpie ~]$ python testPersist.py 0903334-016.boost Ok. 

 [swinbank@lsst-dev ~]$uname -a Linux lsst-dev.ncsa.illinois.edu 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 21:19:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [swinbank@lsst-dev ~]$ eups list -s lsst_apps  11.0+76 current b1856 setup [swinbank@lsst-dev ~]$md5sum testPersist.py 0903334-016.boost 1fc9f21c54acd674990dadab2addb4b7 testPersist.py 5d8713f610f1e51d9ba08a6a8609207e 0903334-016.boost [swinbank@lsst-dev ~]$ python testPersist.py 0903334-016.boost Traceback (most recent call last):  File "testPersist.py", line 12, in   dafBase.PropertySet()))  File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/persistenceLib.py", line 1430, in unsafeRetrieve  return _persistenceLib.Persistence_unsafeRetrieve(self, *args) Exception: input stream error 

Running on Linux tiger-sumire 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 16:24:53 EST 2015 x86_64 x86_64 x86_64 GNU/Linux, which has the same version of the stack installed through eups distrib but otherwise shares no infrastructure with lsst-dev, I see exactly the same error.

We have a number of versions of the stack installed on tiger-sumire, dating back to v10. I see the same error with all of them. However, I am able to load metadata which was persisted in Summer 2015 with no problems.

I also see the same symptoms if using the XmlStorage backend. I attach the generated XML file for reference. This is entertaining, as it's possible to eliminate the error by deleting parts of the file. I spent a few minutes experimenting with this, but have not been able to narrow it down to a single field or set of fields which is causing the problem.

Show
John Swinbank added a comment - - edited 0903334-016.boost and simple test script I wrote attached. Example of running on different machines: [jds@magpie ~]$uname -a Darwin magpie 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,3 Darwin [jds@magpie ~]$ eups list -s lsst_apps 11.0+76 current w_2016_03 w_latest b1856 setup [jds@magpie ~]$md5sum testPersist.py 0903334-016.boost 1fc9f21c54acd674990dadab2addb4b7 testPersist.py 5d8713f610f1e51d9ba08a6a8609207e 0903334-016.boost [jds@magpie ~]$ python testPersist.py 0903334-016.boost Ok. [swinbank@lsst-dev ~]$uname -a Linux lsst-dev.ncsa.illinois.edu 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 21:19:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [swinbank@lsst-dev ~]$ eups list -s lsst_apps 11.0+76 current b1856 setup [swinbank@lsst-dev ~]$md5sum testPersist.py 0903334-016.boost 1fc9f21c54acd674990dadab2addb4b7 testPersist.py 5d8713f610f1e51d9ba08a6a8609207e 0903334-016.boost [swinbank@lsst-dev ~]$ python testPersist.py 0903334-016.boost Traceback (most recent call last): File "testPersist.py", line 12, in <module> dafBase.PropertySet())) File "/home/lsstsw/stack/Linux64/daf_persistence/2015_10.0-6-g5a5f333+5/python/lsst/daf/persistence/persistenceLib.py", line 1430, in unsafeRetrieve return _persistenceLib.Persistence_unsafeRetrieve(self, *args) Exception: input stream error Running on Linux tiger-sumire 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 16:24:53 EST 2015 x86_64 x86_64 x86_64 GNU/Linux , which has the same version of the stack installed through eups distrib but otherwise shares no infrastructure with lsst-dev , I see exactly the same error. We have a number of versions of the stack installed on tiger-sumire , dating back to v10. I see the same error with all of them. However, I am able to load metadata which was persisted in Summer 2015 with no problems. I also see the same symptoms if using the XmlStorage backend. I attach the generated XML file for reference. This is entertaining, as it's possible to eliminate the error by deleting parts of the file. I spent a few minutes experimenting with this, but have not been able to narrow it down to a single field or set of fields which is causing the problem.
Hide
John Swinbank added a comment -

I note the comment on clo that persistence for PropertySet and PropertyList could become pure Python, which would make this problem go away. Kian-Tat Lim, Nate Pease: is that just a possibility, or is it work that's likely to be scheduled?

Show
John Swinbank added a comment - I note the comment on clo that persistence for PropertySet and PropertyList could become pure Python, which would make this problem go away. Kian-Tat Lim , Nate Pease : is that just a possibility, or is it work that's likely to be scheduled?
Hide
Kian-Tat Lim added a comment -

I have some code to do it, but it's yet another hack to daf_persistence and it will require changes to obs_* policies.

Show
Kian-Tat Lim added a comment - I have some code to do it, but it's yet another hack to daf_persistence and it will require changes to obs_* policies.
Hide
Kian-Tat Lim added a comment -

Oh, and there's a problem with persisting lsst.daf.base.DateTime objects that are part of PropertySet/Lists that may require changes to that package.

Show
Kian-Tat Lim added a comment - Oh, and there's a problem with persisting lsst.daf.base.DateTime objects that are part of PropertySet/Lists that may require changes to that package.
Hide
Jim Bosch added a comment -

Any objection to me removing the bits that read metadata from ci_hsc as a temporary workaround? It doesn't look like a better fix is coming in the short term. I could easily do that work on DM-5084, where I'm going to be adding some tests to ci_hsc, and I've already had to make those changes locally anyway.

Show
Jim Bosch added a comment - Any objection to me removing the bits that read metadata from ci_hsc as a temporary workaround? It doesn't look like a better fix is coming in the short term. I could easily do that work on DM-5084 , where I'm going to be adding some tests to ci_hsc, and I've already had to make those changes locally anyway.
Hide
John Swinbank added a comment -

Jim Bosch I actually started doing that (currently on tickets/DM-4927) but got distracted. If you have time to do it properly, go ahead.

Show
John Swinbank added a comment - Jim Bosch I actually started doing that (currently on tickets/ DM-4927 ) but got distracted. If you have time to do it properly, go ahead.
Hide
Jim Bosch added a comment -

Workaround added on tickets/DM-5084. It now continues to try to read metadata, but merely warns (with this issue number) when that raises an exception (more precisely, it catches and warns on any dataset whose "persistable" begins with "Property").

Show
Jim Bosch added a comment - Workaround added on tickets/ DM-5084 . It now continues to try to read metadata, but merely warns (with this issue number) when that raises an exception (more precisely, it catches and warns on any dataset whose "persistable" begins with "Property").
Hide
John Swinbank added a comment -

It seems like this is a problem with persistence, rather than Science Pipelines. Further, given the workaround established, it's not actually blocking us from proceeding. I'm reassigning to the Architecture team and dropping it from the HSC porting epic.

Show
John Swinbank added a comment - It seems like this is a problem with persistence, rather than Science Pipelines. Further, given the workaround established, it's not actually blocking us from proceeding. I'm reassigning to the Architecture team and dropping it from the HSC porting epic.
Hide
Michael Wood-Vasey added a comment -

John Swinbank
Is the mappable in the following test solely to step around this present issue of DM-4927?

DM-4683 removes this access model to getting a mapper from the Butler object and breaks ci_hsc solely because of this test.

https://github.com/lsst/ci_hsc/blob/038e28e608794c53bcbdee6bbd60e733de8a3801/python/lsst/ci/hsc/validate.py#L74

  def validateDataset(self, dataId, dataset):  self.assertTrue("%s exists" % dataset, self.butler.datasetExists(datasetType=dataset, dataId=dataId))  # Just warn if we can't load a PropertySet or PropertyList; there's a known issue  # (DM-4927) that prevents these from being loaded on Linux, with no imminent resolution.  mappable = self.butler.repository._mapper.datasets.get(dataset, None)  if mappable is not None and mappable.persistable.startswith("Property"):  try:  data = self.butler.get(dataset, dataId)  self.assertTrue("%s readable (%s)" % (dataset, data.__class__), data is not None)  except:  self.log.warn("Unable to load '%s'; this is likely DM-4927." % dataset)  return  data = self.butler.get(dataset, dataId)  self.assertTrue("%s readable (%s)" % (dataset, data.__class__), data is not None) 

Show
Michael Wood-Vasey added a comment - John Swinbank Is the mappable in the following test solely to step around this present issue of DM-4927 ? DM-4683 removes this access model to getting a mapper from the Butler object and breaks ci_hsc solely because of this test. https://github.com/lsst/ci_hsc/blob/038e28e608794c53bcbdee6bbd60e733de8a3801/python/lsst/ci/hsc/validate.py#L74 def validateDataset(self, dataId, dataset): self.assertTrue("%s exists" % dataset, self.butler.datasetExists(datasetType=dataset, dataId=dataId)) # Just warn if we can't load a PropertySet or PropertyList; there's a known issue # (DM-4927) that prevents these from being loaded on Linux, with no imminent resolution. mappable = self.butler.repository._mapper.datasets.get(dataset, None) if mappable is not None and mappable.persistable.startswith("Property"): try: data = self.butler.get(dataset, dataId) self.assertTrue("%s readable (%s)" % (dataset, data.__class__), data is not None) except: self.log.warn("Unable to load '%s'; this is likely DM-4927." % dataset) return data = self.butler.get(dataset, dataId) self.assertTrue("%s readable (%s)" % (dataset, data.__class__), data is not None)
Hide
John Swinbank added a comment -

Is the mappable in the following test solely to step around this present issue of DM-4927?

Yes, I believe so: see https://github.com/lsst/ci_hsc/commit/189b9010aff39527a3f1ff1650c8092a364f0c4b for the details.

Given that, do you plan to take care of it on DM-5372, or would you like further input from me/Science Pipelines?

Show
John Swinbank added a comment - Is the mappable in the following test solely to step around this present issue of DM-4927 ? Yes, I believe so: see https://github.com/lsst/ci_hsc/commit/189b9010aff39527a3f1ff1650c8092a364f0c4b for the details. Given that, do you plan to take care of it on DM-5372 , or would you like further input from me/Science Pipelines?
Hide
Michael Wood-Vasey added a comment -

Nate Pease
This is the code in question for replacing the call to mappable.

Show
Michael Wood-Vasey added a comment - Nate Pease This is the code in question for replacing the call to mappable.
Hide
Michael Wood-Vasey added a comment -

John Swinbank

I updated the awkward test/workaround in ci_hsc as part of DM-5372.

I took no actions to actually resolving the problem in DM-4927.

Show
Michael Wood-Vasey added a comment - John Swinbank I updated the awkward test/workaround in ci_hsc as part of DM-5372 . I took no actions to actually resolving the problem in DM-4927 .
Hide
Colin Slater added a comment -

I got caught by this same problem when trying to look up timing information in the metadata. The problem is that measurePsf.spatialFitChi2 sometimes reports a NaN, which boost can't read properly (same with infinity). Overwriting the NaN with a number allows the file to load. Interestingly, this is a reappearance of Trac #791 that was filed 7 years ago.

Show
Colin Slater added a comment - I got caught by this same problem when trying to look up timing information in the metadata. The problem is that measurePsf.spatialFitChi2 sometimes reports a NaN, which boost can't read properly (same with infinity). Overwriting the NaN with a number allows the file to load. Interestingly, this is a reappearance of Trac #791 that was filed 7 years ago.
Hide
John Swinbank added a comment -

Nice catch!

Show
John Swinbank added a comment - Nice catch!
Hide
Colin Slater added a comment -

This bug is blocking any analysis of the metadata for process timing or other purposes. Since I think this is a daf_persistence issue, Kian-Tat Lim how would you like to handle this? We could remove the offending NaN from psfex, but that would seem to invite another reappearance sometime later. Other options I can think of would be replacing NaNs with zeros or throwing an exception on attempting to write NaNs.

Show
Colin Slater added a comment - This bug is blocking any analysis of the metadata for process timing or other purposes. Since I think this is a daf_persistence issue, Kian-Tat Lim how would you like to handle this? We could remove the offending NaN from psfex, but that would seem to invite another reappearance sometime later. Other options I can think of would be replacing NaNs with zeros or throwing an exception on attempting to write NaNs.
Hide
Kian-Tat Lim added a comment - - edited

I'm very sorry; I missed the prior conversation on this even though I'm supposed to be a watcher.

As the ultimate solution, I'd like to see if https://github.com/lsst/daf_persistence/commit/6ed767db014fbcec7478ff7d616de3b0fc691b14 can be finished. It looks like NaNs round-trip through YAML without difficulty. I don't think it's that far away, maybe needing an hour or two of work, although the earliest I can probably find that time would be tonight. Perhaps Nate Pease can look at it in the meantime? One downside is that all the obs_* package policies defining "*_metadata" dataset types will need to be changed, and old metadata would become unreadable with new versions of the stack. If we had already moved all of the "*_metadata" policies to daf_butlerUtils using Paul Price's new mechanism, then only that one place would need to be changed.

Show
Kian-Tat Lim added a comment - - edited I'm very sorry; I missed the prior conversation on this even though I'm supposed to be a watcher. As the ultimate solution, I'd like to see if https://github.com/lsst/daf_persistence/commit/6ed767db014fbcec7478ff7d616de3b0fc691b14 can be finished. It looks like NaNs round-trip through YAML without difficulty. I don't think it's that far away, maybe needing an hour or two of work, although the earliest I can probably find that time would be tonight. Perhaps Nate Pease can look at it in the meantime? One downside is that all the obs_* package policies defining "*_metadata" dataset types will need to be changed, and old metadata would become unreadable with new versions of the stack. If we had already moved all of the "*_metadata" policies to daf_butlerUtils using Paul Price 's new mechanism, then only that one place would need to be changed.
Hide
Colin Slater added a comment -

+1 on switching to YAML and centralizing the metadata policies. That would be a great solution.

Show
Colin Slater added a comment - +1 on switching to YAML and centralizing the metadata policies. That would be a great solution.
Hide
Jim Bosch added a comment -

Linking DM-7049, which is moving what definitions we can from obs* to daf_butlerUtils.

Note that it's not actually possible to move all of the *_metadata policies to daf_butlerUtils, since the templates for the single-frame processing metadata are different for each camera.

Show
Jim Bosch added a comment - Linking DM-7049 , which is moving what definitions we can from obs* to daf_butlerUtils. Note that it's not actually possible to move all of the *_metadata policies to daf_butlerUtils, since the templates for the single-frame processing metadata are different for each camera.
Hide
Michael Wood-Vasey added a comment - - edited

Can we keep a compatibility layer that fails-back to the old-style metadata while emitting a deprecation warning? After say ~1 year this layer could be related.

This suggestion relates to the request for a deprecation implementation policy of https://jira.lsstcorp.org/browse/RFC-213.

Show
Michael Wood-Vasey added a comment - - edited Can we keep a compatibility layer that fails-back to the old-style metadata while emitting a deprecation warning? After say ~1 year this layer could be related. This suggestion relates to the request for a deprecation implementation policy of https://jira.lsstcorp.org/browse/RFC-213 .
Hide
Jim Bosch added a comment -

If we want backwards compatbility (I'm ambivalent about that myself), I think we'd want to:

• Define new datasetTypes for YAML metadata files.
Show
Jim Bosch added a comment - If we want backwards compatbility (I'm ambivalent about that myself), I think we'd want to: Define new datasetTypes for YAML metadata files. Switch new tasks to write the new YAML metadata files instead of the boost-serialized ones. Remove write support for boost-serialized metadata, but keep read support.
Hide
Jim Bosch added a comment -

This has been obsoleted by DM-15082.

Show
Jim Bosch added a comment - This has been obsoleted by DM-15082 .

#### People

Assignee:
Unassigned
Reporter:
John Swinbank
Watchers:
Colin Slater, Jim Bosch, John Swinbank, Kian-Tat Lim, Michael Wood-Vasey, Paul Price, Vishal Kasliwal [X] (Inactive)