Steve Pietrowicz wrote:
I read through the CLO posting and I'm still not clear on what exactly is going to be implemented. Is this entirely in the CmdLineTask code?
I propose putting together a module that will be run by the CmdLineTask code. This module will determine the versions of the various components of the stack in multiple ways:
- For our python packages, using the version.py files we write as part of building (this gets set from git);
- For external python packages, using __version__; and
- For select C++ modules, using whatever mechanism they supply.
If the butler is aware of a persisted set of versions, we will retrieve that and compare, raising an exception if they differ (this behaviour may be disabled by the user); otherwise the set of versions will be persisted by the butler for future comparisons.
How do you envision this would be used by orchestration, if at all?
I hope that the proposed module might be useful in orchestration when the time comes to pull the implementation into orchestration, either by providing code that can be utilised directly, or through providing examples of what worked or didn't work.
Is this recorded before an execution occurs?
It would run after we start python and import a bunch of stuff, but before we call CmdLineTask.run.
(The astute reader will notice that this is therefore subject to a race condition, just as when we validate the configuration. There's not much we can do about that, but the workaround is the same as for the configuration validation — first run a CmdLineTask with no input data, e.g., here.)
I'm concerned about introspecting python modules for their versions because 1) unless versioning is automated people will forget to bump the version number, 2) this doesn't take into account changes in underlying C++ code, 3) not all underlying libraries we depend on even have a Python component (apr), 4) this does not take into account information in the overall environment where the software is running (os, machine, etc).
I admit that we cannot capture the complete state of the entire system. Nevertheless, we can capture a lot more of the state of the system than we are currently, and I think we can capture enough information to be useful much of the time, at least useful enough to keep a user from polluting a production run with results from different versions of the stack. Upon implementation of this proposal we would have a module that captures at least some of the state of the system, and this could be expanded as it is deemed useful and effort becomes available.
For orchestrated production runs, there is additional support software that aren't directly related to the CmdLineTask code but will still have provenance recorded.
I'd like to handle this in a uniform way across all software in the stack, and not have one solution for provenance for CmdLineTask and another for orchestration.
I would love to re-use existing code. Could you please point me to what you're using in orchestration?