Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ctrl_pool
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:DRP S17-1
-
Team:Data Release Production
Description
Remove pex_logging dependency in ctrl_pool.
Attachments
Issue Links
Activity
This page in the Developer Guide may be useful: https://developer.lsst.io/coding/logging.html
Paul Price
Can you comment on the changes required in python/lsst/ctrl/pool/log.py?
Since pex_logging is being removed, I will remove the current pickleLog method. Should pickleLsstLog become pickleLog? or remain as it is? I am not sure how this method is used in ctrl_pool.
Here's what I suggest:
- Put the docstring for pickleLog in pickleLsstLog (it's following the same approach, so need the same disclaimer).
- Remove pickleLog (don't need it if we're not pickling any lsst.pex.logging.Log objects — assuming it's been removed everywhere else).
- Rename pickleLsstLog as pickleLog (clearer).
- Remove the copyreg.pickle calls for the old logger and update the one for the new logger.
- If it's possible, configure the logger to put the log output in a file named after the job rather than the generic jobs.log.
Paul Price
Renaming of the log should be a default action (not requiring a change to the config?)
Can you tell me how to run the demo – that is, from which machine?
I don't know how to accomplish renaming the log file. I'm not sure whether the new log package has a simple python API for changing the filename. Perhaps Hsin-Fang Chiang can advise?
Instructions for the demo are in the file demo.txt.
Oh, I think I remember discussing the filename thing before, and we concluded that there didn't need to be a python API, and we were expected to redirect the output as appropriate.
I thought about clobbering sys.stdout, but I think the output isn't coming from python, so we would need to intercept it at an even higher level. That means putting something in BatchCmdLineTask.batchCommand, but I'm not sure how to do that in a good way when multiple processes are running that one command.
Maybe we should just remove the log file altogether, and rely on the batch system catching the stdout? Then we need to make sure that the host and PID are being prepended to the log messages.
Paul PriceI think I can see how to redirect the output. But I want to test it. demo.txt seems to be specific to a specific machine. And I have no idea what Slurm is. For starters, can I run the demo on any machine: say a machine which just has ci_hsc?
Use the section labelled "Run the demo on a single machine".
Actually --batch-type=smp is the default, so you can just leave it off. You'll need to specify --cores though.
Some previous discussions related to having a new log filename API are in DM-7997.
Without that, I think you may still be able to achieve what's wanted through the existing log configuring APIs.
Andy Salnikov may have more ideas.
For example, besides configure, Log also has configure_prop that takes a string of config properties.
As I mentioned, I can easily do what paul originally asked for – by using the log4cxx.properties file and replacing the File entry "jobs.log" with the name of the job. If more is required, please let me know.
See tickets/DM-8359. I have substituted a name = job + '.log' into the log4cxx.properties file in place of "jobs.log". I could also make separate logs for each machine and pid, but I am not sure what is wanted. See the commented out code in log.py
I can imagine that you might want more flexibility in log naming than this, but it would probably require a config file for lsst.log.Log on top of log4cxx.properties.
Ready for review, but the decision about how to name the log and whether to separate by machine and pid is yours.
Please create the pull request on GitHub.
Reassigning to Perry Gee to create GitHub PR.
log4cxx PropertyConfigurator supports environment (or system property in Java-speak) interpolation, so one way to make file name dynamic is to define some environment variable (if it's not defined already) and specify file name like this:
...
|
log4j.appender.F1.File=${JOBNAME}.${PID}.log
|
...
|
If envvar is not defined it will be replaced with empty string, and you can provide default for the properties in a config file itself:
# if JOBNAME envvar is not defined then value below will be used
|
JOBNAME=job
|
...
|
log4j.appender.F1.File=${JOBNAME}.${PID}.log
|
...
|
I think I prefer this over reading file in a script and making substitutions.
- As pointed out by Tim Jenness, need to remove pex_logging from the EUPS table file.
- I agree with Andy Salnikov that it's rather crude to read the file and modify it. Why can't you simply construct the 'file' in a python string and load that? The envvar approach suggested by Andy is unnecessary because you have the parameters you need are available in python.
In general, I think, we prefer not to use hard-coded logging configuration as it makes it harder to update in case you want to redirect logging to some other destination or change output format. configure_prop() method was actually meant only for self-contained applications like unit tests which sometimes require hard-coded output format or destination, for everything else logging should be configured from a file (or use default configuration).
OK, how about pulling the filename line out of the configuration file, and only constructing that one line in python?
Paul, it's not possible with current log API.
Andy Salnikov: Do you mean that we can't configure the log twice? Why can't we do:
def jobLog(job):
|
"""Add a job-specific log destination"""
|
if job is None or job == "None":
|
return
|
packageDir = getPackageDir("ctrl_pool")
|
lsstLog.configure(os.path.join(packageDir, "config/log4cxx.properties"))
|
lsstLog.configure_prop('log4j.appender.F1.File=%s.log\n' % (job,))
|
lsstLog.MDC("PID", os.getpid())
|
I tried adding a single entry with configure_prop, but it doesn't seem to work. I believe that configure_prop wants to get the entire configuration string. Which is why I chose to read log4cxx.properties. I could be wrong here though. This is not my area of expertise.
I do believe that what is being asked for is a flexible configuration layer for ctrl_pool, and that could be done, but it is really beyond the scope of this ticket.
I will wait to hear from the rest of you, but I do think that the change to jobname + '.log' is really not worth doing at this point if the hard-coded solution is not acceptable.
Well, I'm disappointed that the logger configuration is so inflexible, but we're not going to solve that here. In that case, I think the options are:
1. Use Andy's suggested method (I suggest using something unique, like LSST_LOGNAME); or
2. Form the file in code (Andy has argued against this); or
3. Munge the configuration file in a manner that will be a bit more robust (a common method is to put something like @JOB@ at the right place in the file and use str.replace).
I think the effort to do this is well worth it. Having a hard-coded log filename is asking for trouble.
Paul, you can do that but result is not what you expect in this case. log was really designed to be configured from a configuration file and not by application itself, the main reason for that is again the need to change configuration from outside. Applications are not even supposed to call configure() except in rare cases, for most application configuration file name should be specified via environment variable. Still we have configure() and configure_prop() methods that are useful in some cases but they are not supposed to be called more than once per application. To avoid troubles with multiple inconsistent configuration every call to configure() and configure_prop() resets configuration to initial state before loading it from file or string (so effectively last call determines configuration).
If you do not like idea with setting environment variable I can propose something different which I still do not like, but it's slightly better than pattern substitution. Define your configuration in file to look like:
log4j.rootLogger=INFO, A1, F1
|
log4j.appender.A1=ConsoleAppender
|
log4j.appender.A1.Target=System.err
|
log4j.appender.A1.layout=PatternLayout
|
log4j.appender.A1.layout.ConversionPattern=%X{PID} %-5p %c: %m%n
|
log4j.appender.F1=org.apache.log4j.FileAppender
|
log4j.appender.F1.File=${jobname}.${machine}.${PID}.log
|
log4j.appender.F1.layout=PatternLayout
|
log4j.appender.F1.layout.ConversionPattern=%X{PID} %-5p %d{yyyy-MM-ddThh:mm:ss.sss} %c (%X{LABEL})(%F:%L)- %m%n
|
And in your code prepend "property=replacement" strings to that file:
fin = open(os.path.join(packageDir, "config/log4cxx.properties"), 'r') |
prop = "jobname=" + job |
prop = prop + '\n' + "machine=" + machine |
prop = prop + '\n' + fin.read() |
lsstLog.configure_prop(prop)
|
And I still prefer doing that via environment variables because we do want to avoid messing with logging configuration from inside application code itself.
The reason I did things the way I did originally is because I wanted for the most part avoided needing to know the logic or sequencing of the log4cxx.properties file, at least as a part of a single client configuration. 3 has the same benefit with slightly more robustness, so I am OK with that. But it also means that we are creating a meta-language for the properties file. It is ugly to put such code in a single client for lsst.log, rather than in lsst.log itself – but I will do so if that is the consensus. However, we should rename the original file log4cxx.properties.template or some such thing.
I don't think that the other solutions will work without requiring an understanding of the properties format within ctrl_pool's log.py, and I think we should avoid this at all costs. A configuration API is really what is required (as we all knew all along), but I don't want to do that here.
Perry Gee, there is an "API" already and it's rather simple to use (but it's not a Python API)
# define location of config file
|
export LSST_LOG_CONFIG=/path/to/config.file |
# define substitutions for variables in config file
|
export JOBNAME=job_1234 |
export MACHINE=$HOSTNAME |
# run your application
|
./superDuperTask.py --id ... |
The reason why we prefer to do it this way is that (as you said yourself) application code should not need to know how logging is handled - where logging output goes to and how it is formatted. Another reason is that by the time your application decides to change logging configuration other parts of application may be already logging something to default-configured logger.
Paul Price
Paul: I think that the best solution (consistent with Andy's desires) is to use the environment variables in the properties file. They at least have the advantage that if they don't get set, the variables get set to "" and no particular harm is done.
I can either set JOBNAME variable from the "job" config parameter passed on the cmdline of ctrlPoolDemo.py, or require that the user set it themselves before starting. I'm not that sure how ctrlPool is used, so I'm not sure if making the setting voluntary on the part of the caller is a good thing or not.
If you're going to go that route, I think you should set LSST_JOBNAME from the job argument. Please don't make it the user's responsibility to set it.
Paul Price Are you done reviewing?
For whatever reason, there appear to be two logs set up by the old code – one per machine and pid using pex_logging and one using lsst.log.Log. I wanted to know if I should replicate this.
Yes, I marked the ticket as "Reviewed".
Don't worry about the one per machine+PID using pex_logging. I believe the new logger prefixes the hostname+PID into the log file, so they can be disentangled with grep.
Frossie Economou requests that we tackle this within the first couple of months of S17.