Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8359

Port ctrl_pool to log package

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ctrl_pool
    • Labels:
      None

      Description

      Remove pex_logging dependency in ctrl_pool.

        Attachments

          Issue Links

            Activity

            No builds found.
            hchiang2 Hsin-Fang Chiang created issue -
            hchiang2 Hsin-Fang Chiang made changes -
            Field Original Value New Value
            Remote Link This issue links to "Page (Confluence)" [ 14403 ]
            hchiang2 Hsin-Fang Chiang made changes -
            Link This issue blocks DM-8365 [ DM-8365 ]
            Hide
            swinbank John Swinbank added a comment -

            Frossie Economou requests that we tackle this within the first couple of months of S17.

            Show
            swinbank John Swinbank added a comment - Frossie Economou requests that we tackle this within the first couple of months of S17.
            swinbank John Swinbank made changes -
            Epic Link DM-8136 [ 27591 ]
            swinbank John Swinbank made changes -
            Story Points 2
            Team Data Release Production [ 10301 ]
            frossie Frossie Economou made changes -
            Link This issue blocks DM-8477 [ DM-8477 ]
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            This page in the Developer Guide may be useful: https://developer.lsst.io/coding/logging.html

            Show
            hchiang2 Hsin-Fang Chiang added a comment - This page in the Developer Guide may be useful: https://developer.lsst.io/coding/logging.html
            Hide
            pgee Perry Gee added a comment -

            Paul Price
            Can you comment on the changes required in python/lsst/ctrl/pool/log.py?

            Since pex_logging is being removed, I will remove the current pickleLog method. Should pickleLsstLog become pickleLog? or remain as it is? I am not sure how this method is used in ctrl_pool.

            Show
            pgee Perry Gee added a comment - Paul Price Can you comment on the changes required in python/lsst/ctrl/pool/log.py? Since pex_logging is being removed, I will remove the current pickleLog method. Should pickleLsstLog become pickleLog? or remain as it is? I am not sure how this method is used in ctrl_pool.
            swinbank John Swinbank made changes -
            Assignee Perry Gee [ pgee ]
            swinbank John Swinbank made changes -
            Sprint DRP S17-1 [ 303 ]
            Hide
            price Paul Price added a comment -

            Here's what I suggest:

            • Put the docstring for pickleLog in pickleLsstLog (it's following the same approach, so need the same disclaimer).
            • Remove pickleLog (don't need it if we're not pickling any lsst.pex.logging.Log objects — assuming it's been removed everywhere else).
            • Rename pickleLsstLog as pickleLog (clearer).
            • Remove the copyreg.pickle calls for the old logger and update the one for the new logger.
            • If it's possible, configure the logger to put the log output in a file named after the job rather than the generic jobs.log.
            Show
            price Paul Price added a comment - Here's what I suggest: Put the docstring for pickleLog in pickleLsstLog (it's following the same approach, so need the same disclaimer). Remove pickleLog (don't need it if we're not pickling any lsst.pex.logging.Log objects — assuming it's been removed everywhere else). Rename pickleLsstLog as pickleLog (clearer). Remove the copyreg.pickle calls for the old logger and update the one for the new logger. If it's possible, configure the logger to put the log output in a file named after the job rather than the generic jobs.log .
            pgee Perry Gee made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            pgee Perry Gee added a comment - - edited

            Paul Price
            Renaming of the log should be a default action (not requiring a change to the config?)

            Can you tell me how to run the demo – that is, from which machine?

            Show
            pgee Perry Gee added a comment - - edited Paul Price Renaming of the log should be a default action (not requiring a change to the config?) Can you tell me how to run the demo – that is, from which machine?
            Hide
            price Paul Price added a comment -

            I don't know how to accomplish renaming the log file. I'm not sure whether the new log package has a simple python API for changing the filename. Perhaps Hsin-Fang Chiang can advise?

            Instructions for the demo are in the file demo.txt.

            Show
            price Paul Price added a comment - I don't know how to accomplish renaming the log file. I'm not sure whether the new log package has a simple python API for changing the filename. Perhaps Hsin-Fang Chiang can advise? Instructions for the demo are in the file demo.txt .
            Hide
            price Paul Price added a comment -

            Oh, I think I remember discussing the filename thing before, and we concluded that there didn't need to be a python API, and we were expected to redirect the output as appropriate.

            I thought about clobbering sys.stdout, but I think the output isn't coming from python, so we would need to intercept it at an even higher level. That means putting something in BatchCmdLineTask.batchCommand, but I'm not sure how to do that in a good way when multiple processes are running that one command.

            Maybe we should just remove the log file altogether, and rely on the batch system catching the stdout? Then we need to make sure that the host and PID are being prepended to the log messages.

            Show
            price Paul Price added a comment - Oh, I think I remember discussing the filename thing before, and we concluded that there didn't need to be a python API, and we were expected to redirect the output as appropriate. I thought about clobbering sys.stdout , but I think the output isn't coming from python, so we would need to intercept it at an even higher level. That means putting something in BatchCmdLineTask.batchCommand , but I'm not sure how to do that in a good way when multiple processes are running that one command. Maybe we should just remove the log file altogether, and rely on the batch system catching the stdout? Then we need to make sure that the host and PID are being prepended to the log messages.
            Hide
            pgee Perry Gee added a comment -

            Paul PriceI think I can see how to redirect the output. But I want to test it. demo.txt seems to be specific to a specific machine. And I have no idea what Slurm is. For starters, can I run the demo on any machine: say a machine which just has ci_hsc?

            Show
            pgee Perry Gee added a comment - Paul Price I think I can see how to redirect the output. But I want to test it. demo.txt seems to be specific to a specific machine. And I have no idea what Slurm is. For starters, can I run the demo on any machine: say a machine which just has ci_hsc?
            Hide
            price Paul Price added a comment -

            Use the section labelled "Run the demo on a single machine".

            Show
            price Paul Price added a comment - Use the section labelled "Run the demo on a single machine".
            Hide
            price Paul Price added a comment -

            Actually --batch-type=smp is the default, so you can just leave it off. You'll need to specify --cores though.

            Show
            price Paul Price added a comment - Actually --batch-type=smp is the default, so you can just leave it off. You'll need to specify --cores though.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Some previous discussions related to having a new log filename API are in DM-7997.

            Without that, I think you may still be able to achieve what's wanted through the existing log configuring APIs.

            Andy Salnikov may have more ideas.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Some previous discussions related to having a new log filename API are in DM-7997 . Without that, I think you may still be able to achieve what's wanted through the existing log configuring APIs. Andy Salnikov may have more ideas.
            hchiang2 Hsin-Fang Chiang made changes -
            Watchers Hsin-Fang Chiang, John Swinbank, Paul Price, Perry Gee [ Hsin-Fang Chiang, John Swinbank, Paul Price, Perry Gee ] Andy Salnikov, Hsin-Fang Chiang, John Swinbank, Paul Price, Perry Gee [ Andy Salnikov, Hsin-Fang Chiang, John Swinbank, Paul Price, Perry Gee ]
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            For example, besides configure, Log also has configure_prop that takes a string of config properties.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - For example, besides configure , Log also has configure_prop that takes a string of config properties.
            Hide
            pgee Perry Gee added a comment -

            As I mentioned, I can easily do what paul originally asked for – by using the log4cxx.properties file and replacing the File entry "jobs.log" with the name of the job. If more is required, please let me know.

            Show
            pgee Perry Gee added a comment - As I mentioned, I can easily do what paul originally asked for – by using the log4cxx.properties file and replacing the File entry "jobs.log" with the name of the job. If more is required, please let me know.
            Hide
            pgee Perry Gee added a comment -

            Paul Price

            See tickets/DM-8359. I have substituted a name = job + '.log' into the log4cxx.properties file in place of "jobs.log". I could also make separate logs for each machine and pid, but I am not sure what is wanted. See the commented out code in log.py

            I can imagine that you might want more flexibility in log naming than this, but it would probably require a config file for lsst.log.Log on top of log4cxx.properties.

            Show
            pgee Perry Gee added a comment - Paul Price See tickets/ DM-8359 . I have substituted a name = job + '.log' into the log4cxx.properties file in place of "jobs.log". I could also make separate logs for each machine and pid, but I am not sure what is wanted. See the commented out code in log.py I can imagine that you might want more flexibility in log naming than this, but it would probably require a config file for lsst.log.Log on top of log4cxx.properties.
            Hide
            pgee Perry Gee added a comment -

            Ready for review, but the decision about how to name the log and whether to separate by machine and pid is yours.

            Show
            pgee Perry Gee added a comment - Ready for review, but the decision about how to name the log and whether to separate by machine and pid is yours.
            pgee Perry Gee made changes -
            Reviewers Paul Price [ price ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            price Paul Price added a comment -

            Please create the pull request on GitHub.

            Show
            price Paul Price added a comment - Please create the pull request on GitHub.
            Hide
            price Paul Price added a comment -

            Reassigning to Perry Gee to create GitHub PR.

            Show
            price Paul Price added a comment - Reassigning to Perry Gee to create GitHub PR.
            price Paul Price made changes -
            Status In Review [ 10004 ] In Progress [ 3 ]
            Hide
            salnikov Andy Salnikov added a comment - - edited

            log4cxx PropertyConfigurator supports environment (or system property in Java-speak) interpolation, so one way to make file name dynamic is to define some environment variable (if it's not defined already) and specify file name like this:

            ...
            log4j.appender.F1.File=${JOBNAME}.${PID}.log
            ...
            

            If envvar is not defined it will be replaced with empty string, and you can provide default for the properties in a config file itself:

            # if JOBNAME envvar is not defined then value below will be used
            JOBNAME=job
            ...
            log4j.appender.F1.File=${JOBNAME}.${PID}.log
            ...
            

            I think I prefer this over reading file in a script and making substitutions.

            Show
            salnikov Andy Salnikov added a comment - - edited log4cxx PropertyConfigurator supports environment (or system property in Java-speak) interpolation, so one way to make file name dynamic is to define some environment variable (if it's not defined already) and specify file name like this: ... log4j.appender.F1.File=${JOBNAME}.${PID}.log ... If envvar is not defined it will be replaced with empty string, and you can provide default for the properties in a config file itself: # if JOBNAME envvar is not defined then value below will be used JOBNAME=job ... log4j.appender.F1.File=${JOBNAME}.${PID}.log ... I think I prefer this over reading file in a script and making substitutions.
            Hide
            price Paul Price added a comment -
            • As pointed out by Tim Jenness, need to remove pex_logging from the EUPS table file.
            • I agree with Andy Salnikov that it's rather crude to read the file and modify it. Why can't you simply construct the 'file' in a python string and load that? The envvar approach suggested by Andy is unnecessary because you have the parameters you need are available in python.
            Show
            price Paul Price added a comment - As pointed out by Tim Jenness , need to remove pex_logging from the EUPS table file. I agree with Andy Salnikov that it's rather crude to read the file and modify it. Why can't you simply construct the 'file' in a python string and load that? The envvar approach suggested by Andy is unnecessary because you have the parameters you need are available in python.
            price Paul Price made changes -
            Status In Progress [ 3 ] Reviewed [ 10101 ]
            Hide
            salnikov Andy Salnikov added a comment -

            In general, I think, we prefer not to use hard-coded logging configuration as it makes it harder to update in case you want to redirect logging to some other destination or change output format. configure_prop() method was actually meant only for self-contained applications like unit tests which sometimes require hard-coded output format or destination, for everything else logging should be configured from a file (or use default configuration).

            Show
            salnikov Andy Salnikov added a comment - In general, I think, we prefer not to use hard-coded logging configuration as it makes it harder to update in case you want to redirect logging to some other destination or change output format. configure_prop() method was actually meant only for self-contained applications like unit tests which sometimes require hard-coded output format or destination, for everything else logging should be configured from a file (or use default configuration).
            Hide
            price Paul Price added a comment -

            OK, how about pulling the filename line out of the configuration file, and only constructing that one line in python?

            Show
            price Paul Price added a comment - OK, how about pulling the filename line out of the configuration file, and only constructing that one line in python?
            Hide
            salnikov Andy Salnikov added a comment -

            Paul, it's not possible with current log API.

            Show
            salnikov Andy Salnikov added a comment - Paul, it's not possible with current log API.
            Hide
            price Paul Price added a comment -

            Andy Salnikov: Do you mean that we can't configure the log twice? Why can't we do:

            def jobLog(job):
                """Add a job-specific log destination"""
                if job is None or job == "None":
                    return
                packageDir = getPackageDir("ctrl_pool")
                lsstLog.configure(os.path.join(packageDir, "config/log4cxx.properties"))
                lsstLog.configure_prop('log4j.appender.F1.File=%s.log\n' % (job,))
                lsstLog.MDC("PID", os.getpid())
            

            Show
            price Paul Price added a comment - Andy Salnikov : Do you mean that we can't configure the log twice? Why can't we do: def jobLog(job): """Add a job-specific log destination""" if job is None or job == "None": return packageDir = getPackageDir("ctrl_pool") lsstLog.configure(os.path.join(packageDir, "config/log4cxx.properties")) lsstLog.configure_prop('log4j.appender.F1.File=%s.log\n' % (job,)) lsstLog.MDC("PID", os.getpid())
            Hide
            pgee Perry Gee added a comment -

            I tried adding a single entry with configure_prop, but it doesn't seem to work. I believe that configure_prop wants to get the entire configuration string. Which is why I chose to read log4cxx.properties. I could be wrong here though. This is not my area of expertise.

            I do believe that what is being asked for is a flexible configuration layer for ctrl_pool, and that could be done, but it is really beyond the scope of this ticket.

            I will wait to hear from the rest of you, but I do think that the change to jobname + '.log' is really not worth doing at this point if the hard-coded solution is not acceptable.

            Show
            pgee Perry Gee added a comment - I tried adding a single entry with configure_prop, but it doesn't seem to work. I believe that configure_prop wants to get the entire configuration string. Which is why I chose to read log4cxx.properties. I could be wrong here though. This is not my area of expertise. I do believe that what is being asked for is a flexible configuration layer for ctrl_pool, and that could be done, but it is really beyond the scope of this ticket. I will wait to hear from the rest of you, but I do think that the change to jobname + '.log' is really not worth doing at this point if the hard-coded solution is not acceptable.
            Hide
            price Paul Price added a comment - - edited

            Well, I'm disappointed that the logger configuration is so inflexible, but we're not going to solve that here. In that case, I think the options are:
            1. Use Andy's suggested method (I suggest using something unique, like LSST_LOGNAME); or
            2. Form the file in code (Andy has argued against this); or
            3. Munge the configuration file in a manner that will be a bit more robust (a common method is to put something like @JOB@ at the right place in the file and use str.replace).

            I think the effort to do this is well worth it. Having a hard-coded log filename is asking for trouble.

            Show
            price Paul Price added a comment - - edited Well, I'm disappointed that the logger configuration is so inflexible, but we're not going to solve that here. In that case, I think the options are: 1. Use Andy's suggested method (I suggest using something unique, like LSST_LOGNAME ); or 2. Form the file in code (Andy has argued against this); or 3. Munge the configuration file in a manner that will be a bit more robust (a common method is to put something like @JOB@ at the right place in the file and use str.replace ). I think the effort to do this is well worth it. Having a hard-coded log filename is asking for trouble.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Paul, you can do that but result is not what you expect in this case. log was really designed to be configured from a configuration file and not by application itself, the main reason for that is again the need to change configuration from outside. Applications are not even supposed to call configure() except in rare cases, for most application configuration file name should be specified via environment variable. Still we have configure() and configure_prop() methods that are useful in some cases but they are not supposed to be called more than once per application. To avoid troubles with multiple inconsistent configuration every call to configure() and configure_prop() resets configuration to initial state before loading it from file or string (so effectively last call determines configuration).

            If you do not like idea with setting environment variable I can propose something different which I still do not like, but it's slightly better than pattern substitution. Define your configuration in file to look like:

            log4j.rootLogger=INFO, A1, F1
            log4j.appender.A1=ConsoleAppender
            log4j.appender.A1.Target=System.err
            log4j.appender.A1.layout=PatternLayout
            log4j.appender.A1.layout.ConversionPattern=%X{PID} %-5p %c: %m%n
            log4j.appender.F1=org.apache.log4j.FileAppender
            log4j.appender.F1.File=${jobname}.${machine}.${PID}.log
            log4j.appender.F1.layout=PatternLayout
            log4j.appender.F1.layout.ConversionPattern=%X{PID} %-5p %d{yyyy-MM-ddThh:mm:ss.sss} %c (%X{LABEL})(%F:%L)- %m%n
            

            And in your code prepend "property=replacement" strings to that file:

                fin = open(os.path.join(packageDir, "config/log4cxx.properties"), 'r')
                prop = "jobname=" + job
                prop = prop + '\n' + "machine=" + machine
                prop = prop + '\n' + fin.read()
                lsstLog.configure_prop(prop)
            

            And I still prefer doing that via environment variables because we do want to avoid messing with logging configuration from inside application code itself.

            Show
            salnikov Andy Salnikov added a comment - - edited Paul, you can do that but result is not what you expect in this case. log was really designed to be configured from a configuration file and not by application itself, the main reason for that is again the need to change configuration from outside. Applications are not even supposed to call configure() except in rare cases, for most application configuration file name should be specified via environment variable. Still we have configure() and configure_prop() methods that are useful in some cases but they are not supposed to be called more than once per application. To avoid troubles with multiple inconsistent configuration every call to configure() and configure_prop() resets configuration to initial state before loading it from file or string (so effectively last call determines configuration). If you do not like idea with setting environment variable I can propose something different which I still do not like, but it's slightly better than pattern substitution. Define your configuration in file to look like: log4j.rootLogger=INFO, A1, F1 log4j.appender.A1=ConsoleAppender log4j.appender.A1.Target=System.err log4j.appender.A1.layout=PatternLayout log4j.appender.A1.layout.ConversionPattern=%X{PID} %-5p %c: %m%n log4j.appender.F1=org.apache.log4j.FileAppender log4j.appender.F1.File=${jobname}.${machine}.${PID}.log log4j.appender.F1.layout=PatternLayout log4j.appender.F1.layout.ConversionPattern=%X{PID} %-5p %d{yyyy-MM-ddThh:mm:ss.sss} %c (%X{LABEL})(%F:%L)- %m%n And in your code prepend "property=replacement" strings to that file: fin = open (os.path.join(packageDir, "config/log4cxx.properties" ), 'r' ) prop = "jobname=" + job prop = prop + '\n' + "machine=" + machine prop = prop + '\n' + fin.read() lsstLog.configure_prop(prop) And I still prefer doing that via environment variables because we do want to avoid messing with logging configuration from inside application code itself.
            Hide
            pgee Perry Gee added a comment -

            The reason I did things the way I did originally is because I wanted for the most part avoided needing to know the logic or sequencing of the log4cxx.properties file, at least as a part of a single client configuration. 3 has the same benefit with slightly more robustness, so I am OK with that. But it also means that we are creating a meta-language for the properties file. It is ugly to put such code in a single client for lsst.log, rather than in lsst.log itself – but I will do so if that is the consensus. However, we should rename the original file log4cxx.properties.template or some such thing.

            I don't think that the other solutions will work without requiring an understanding of the properties format within ctrl_pool's log.py, and I think we should avoid this at all costs. A configuration API is really what is required (as we all knew all along), but I don't want to do that here.

            Show
            pgee Perry Gee added a comment - The reason I did things the way I did originally is because I wanted for the most part avoided needing to know the logic or sequencing of the log4cxx.properties file, at least as a part of a single client configuration. 3 has the same benefit with slightly more robustness, so I am OK with that. But it also means that we are creating a meta-language for the properties file. It is ugly to put such code in a single client for lsst.log, rather than in lsst.log itself – but I will do so if that is the consensus. However, we should rename the original file log4cxx.properties.template or some such thing. I don't think that the other solutions will work without requiring an understanding of the properties format within ctrl_pool's log.py, and I think we should avoid this at all costs. A configuration API is really what is required (as we all knew all along), but I don't want to do that here.
            Hide
            salnikov Andy Salnikov added a comment -

            Perry Gee, there is an "API" already and it's rather simple to use (but it's not a Python API)

            # define location of config file
            export LSST_LOG_CONFIG=/path/to/config.file
            # define substitutions for variables in config file
            export JOBNAME=job_1234
            export MACHINE=$HOSTNAME
            # run your application
            ./superDuperTask.py --id ...
            

            The reason why we prefer to do it this way is that (as you said yourself) application code should not need to know how logging is handled - where logging output goes to and how it is formatted. Another reason is that by the time your application decides to change logging configuration other parts of application may be already logging something to default-configured logger.

            Show
            salnikov Andy Salnikov added a comment - Perry Gee , there is an "API" already and it's rather simple to use (but it's not a Python API) # define location of config file export LSST_LOG_CONFIG= /path/to/config . file # define substitutions for variables in config file export JOBNAME=job_1234 export MACHINE=$HOSTNAME # run your application . /superDuperTask .py -- id ... The reason why we prefer to do it this way is that (as you said yourself) application code should not need to know how logging is handled - where logging output goes to and how it is formatted. Another reason is that by the time your application decides to change logging configuration other parts of application may be already logging something to default-configured logger.
            Hide
            pgee Perry Gee added a comment -

            Paul Price
            Paul: I think that the best solution (consistent with Andy's desires) is to use the environment variables in the properties file. They at least have the advantage that if they don't get set, the variables get set to "" and no particular harm is done.

            I can either set JOBNAME variable from the "job" config parameter passed on the cmdline of ctrlPoolDemo.py, or require that the user set it themselves before starting. I'm not that sure how ctrlPool is used, so I'm not sure if making the setting voluntary on the part of the caller is a good thing or not.

            Show
            pgee Perry Gee added a comment - Paul Price Paul: I think that the best solution (consistent with Andy's desires) is to use the environment variables in the properties file. They at least have the advantage that if they don't get set, the variables get set to "" and no particular harm is done. I can either set JOBNAME variable from the "job" config parameter passed on the cmdline of ctrlPoolDemo.py, or require that the user set it themselves before starting. I'm not that sure how ctrlPool is used, so I'm not sure if making the setting voluntary on the part of the caller is a good thing or not.
            Hide
            price Paul Price added a comment -

            If you're going to go that route, I think you should set LSST_JOBNAME from the job argument. Please don't make it the user's responsibility to set it.

            Show
            price Paul Price added a comment - If you're going to go that route, I think you should set LSST_JOBNAME from the job argument. Please don't make it the user's responsibility to set it.
            Hide
            pgee Perry Gee added a comment -

            Paul Price Are you done reviewing?

            For whatever reason, there appear to be two logs set up by the old code – one per machine and pid using pex_logging and one using lsst.log.Log. I wanted to know if I should replicate this.

            Show
            pgee Perry Gee added a comment - Paul Price Are you done reviewing? For whatever reason, there appear to be two logs set up by the old code – one per machine and pid using pex_logging and one using lsst.log.Log. I wanted to know if I should replicate this.
            Hide
            price Paul Price added a comment -

            Yes, I marked the ticket as "Reviewed".

            Don't worry about the one per machine+PID using pex_logging. I believe the new logger prefixes the hostname+PID into the log file, so they can be disentangled with grep.

            Show
            price Paul Price added a comment - Yes, I marked the ticket as "Reviewed". Don't worry about the one per machine+PID using pex_logging. I believe the new logger prefixes the hostname+PID into the log file, so they can be disentangled with grep .
            pgee Perry Gee made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              Assignee:
              pgee Perry Gee
              Reporter:
              hchiang2 Hsin-Fang Chiang
              Reviewers:
              Paul Price
              Watchers:
              Andy Salnikov, Hsin-Fang Chiang, John Swinbank, Kian-Tat Lim, Paul Price, Perry Gee
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.