Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17450

Cannot distinguish relevant information for Cosmic Ray detections from logs

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: pipe_tasks
    • Labels:
      None
    • Templates:
    • Team:
      Data Release Production

      Description

      There has been recent discussion about (potential issues with) cosmic ray (CR) detection rates per bandpass in the DESC/DC2 processing.  An attempt was made to make a histogram of number of CRs detected per CCD for each filter.  This figure indicated a strong bias such that bluer bands had significantly lower detection rates.  This is not expected based on any physical arguments, so would need sleuthing as to the cause (code vs. sims).  However, the only way to get the CR detection rates is by parsing the logs, which is actually not possible to do accurately.  The following snippet from #desc-dm-dc2 discussion on Jan 24, 2019 (https://lsstc.slack.com/archives/C978LTJGN/p1548365899270600) explains the situation:

      "...there are 3 statements printed to the logs giving a CR detection count (for the various iterations/stages of refinement in the PSF and background measurements). For example, running

      processCcd.py /datasets/DC2/repo --rerun private/lauren/testing2 --id visit=738955 raftName=R01 detectorName=S11 -C config.py
      

      where I have set config.charImage.measurePsf.starSelector["objectSize"].fluxMin=2000.0 and allowed the CModel apCorrs to fail so that processCcd.py makes it all the way through, there are the following 3 CR log lines:

      processCcd.charImage.repair INFO: Identified 256 cosmic rays.
      processCcd.charImage.repair INFO: Identified 335 cosmic rays.
      processCcd.charImage.repair INFO: Identified 334 cosmic rays.
      

      The third line is the actual final number of CRs that get detected, so the only relevant one that should factor into your histogram. ADDITIONALLY, if I don’t override the `fluxMin` config and the processing fails due to too few PSF candidates, the first line is still printed to the logs. Therefore, if you just parse the logs for lines with “cosmic rays” (or similar), for each CCD that makes it past the initial characterization phase (up to the apCorr phase) you will get three entries (only one of which is truly valid) and for any failed CCD you only get the first line. The above is just one case, but it seems reasonable that the first line will always be the smallest of the three, so if you are CCDs that failed at the PSF measurement phase in your histograms, any band that has more failures will be biased low.

      From the logs I have (I’m not sure how different they would look for different workflows), I can’t distinguish these three lines, nor can I even distinguish which detector a given line applies to (the “unique” id for the process is at the visit level and these lines do not include the dataId)."

      If we are to be able to parse the CR detection rates from the logs, we must be able to distinguish the given phase AND detector for any given log entry.

        Attachments

          Container Issues

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                lauren Lauren MacArthur
                Watchers:
                James Chiang, Jim Bosch, John Swinbank, Lauren MacArthur, Robert Lupton, Yusra AlSayyad
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel