# QuantumGraph pickle files seem to carry environment info or try to import packages

XMLWordPrintable

## Details

• Type: Story
• Status: To Do
• Resolution: Unresolved
• Fix Version/s: None
• Component/s: None
• Labels:
• Team:
Data Access and Database

## Description

The shared stack on lsst-dev environment has more packages than the standard Conda env (e.g. dask). Somehow Gen3 QuantumGraph pickle file seem to carry information to import all packages. Taking a QuantumGraph pickle file made using the shared stack on lsst-dev to use in another environment without those extra packages (e.g. on a "clean" computer with only the stack installed and nothing more) resulted in errors like below:

  File "/opt/lsst/software/stack/stack/miniconda3-4.5.12-1172c30/Linux64/pex_config/17.0.1-1-g703d48b+6/python/lsst/pex/config/config.py", line 990, in loadFromStream  exec(stream, {}, local)  File "", line 37, in  ModuleNotFoundError: No module named 'PIL' 

## Activity

Hide
Hsin-Fang Chiang added a comment -

From the Gen2 era, a persisted config file would carry all imports. That may be relevant here.

Show
Hsin-Fang Chiang added a comment - From the Gen2 era, a persisted config file would carry all imports. That may be relevant here.
Hide
John Swinbank added a comment -

Since this is a middleware issue, setting team to DAX and hoping Fritz Mueller will ensure it gets handled.

Show
John Swinbank added a comment - Since this is a middleware issue, setting team to DAX and hoping Fritz Mueller will ensure it gets handled.
Hide
Andy Salnikov added a comment - - edited

This is definitely pex_config issue, is pex_config DAX responsibility now? I can look at it but it will take time to learn all intricacies (and I likely will break many things along the way). If there are pex_config experts out there maybe we should ask them first?

Show
Andy Salnikov added a comment - - edited This is definitely pex_config issue, is pex_config DAX responsibility now? I can look at it but it will take time to learn all intricacies (and I likely will break many things along the way). If there are pex_config experts out there maybe we should ask them first?
Hide
Jim Bosch added a comment -

Nate Lust may be able to comment on this; this may be fallout from the fix for another bug that he worked on.

Part of me thinks we should just try to avoid all of this by moving away from pickle for QuantumGraph (and Pipeline) sooner rather than later (I think we need to do that eventually anyway).

Show
Jim Bosch added a comment - Nate Lust may be able to comment on this; this may be fallout from the fix for another bug that he worked on. Part of me thinks we should just try to avoid all of this by moving away from pickle for QuantumGraph (and Pipeline) sooner rather than later (I think we need to do that eventually anyway).
Hide
Andy Salnikov added a comment -

I don't think QuantumGraph pickle is a problem, but rather persistence of pex_config as Python code (pickling is based on that). And I'm OK with looking for better replacement for pickle, I thought about that when I implemented pickling but decided that pickle was better than JSON or YAML, mainly because of pex_config.

Show
Andy Salnikov added a comment - I don't think QuantumGraph pickle is a problem, but rather persistence of pex_config as Python code (pickling is based on that). And I'm OK with looking for better replacement for pickle, I thought about that when I implemented pickling but decided that pickle was better than JSON or YAML, mainly because of pex_config.
Hide
Tim Jenness added a comment -

Pex_config pickle files are the output of config.saveToStream stored as a big string so it's definitely possible to reproduce that in YAML and there is a helper routine for converting the string back to the Config.

I'm not really sure how pex_config decides which python imports it needs to include in the output stream so I'm not sure why PIL was added in this case. I did a quick check of serializing a config and it only included this at the top:

 import __main__ assert type(config)==__main__.Complex, 'config is of type %s.%s instead of __main__.Complex' % (type(config).__module__, type(config).__name__) 

Maybe Kian-Tat Lim has some insight into what triggers the imports being included and whether we can do something like go through the saved stream one line at a time and evaluating it until we hit the assert line (and so at least be able to import all the items we can import without breaking and defer breakage until something tries to use that import).

Show
Tim Jenness added a comment - Pex_config pickle files are the output of config.saveToStream stored as a big string so it's definitely possible to reproduce that in YAML and there is a helper routine for converting the string back to the Config. I'm not really sure how pex_config decides which python imports it needs to include in the output stream so I'm not sure why PIL was added in this case. I did a quick check of serializing a config and it only included this at the top: import __main__ assert type(config)==__main__.Complex, 'config is of type %s.%s instead of __main__.Complex' % (type(config).__module__, type(config).__name__) Maybe Kian-Tat Lim has some insight into what triggers the imports being included and whether we can do something like go through the saved stream one line at a time and evaluating it until we hit the assert line (and so at least be able to import all the items we can import without breaking and defer breakage until something tries to use that import).

## People

• Assignee:
Unassigned
Reporter:
Hsin-Fang Chiang
Watchers:
Andy Salnikov, Fritz Mueller, Hsin-Fang Chiang, Jim Bosch, John Swinbank, Tim Jenness