Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11321

Support job metadata in verify_ap

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Epic Link:
    • Sprint:
      Alert Production F17 - 8, Alert Production F17 - 9
    • Team:
      Alert Production

      Description

      When we begin analyzing metrics for verify_ap, those metrics will need to have metadata on the job being run. This could include the dataset/camera/filter IDs (see SQR-019), provenance and pipeline version (see Community post), and information about the server running the pipeline (requested on #dm-squash to control for environmental effects on performance metrics).

      This ticket is to decide a coherent strategy for delivering metadata to verify_ap's verify.Job object, and then adding the necessary infrastructure to verify.ap.metrics. We may need multiple sources of information; for example, reporting camera IDs could naturally be a feature of the verify.ap.Dataset class, and ultimately read from config files, while server information should be foolproof to keep arbitrary users from masquerading as official trial runs.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            Job metadata should also give some indication of swappable or otherwise non-standard pipeline components (suggested by Ian Sullivan) used in a run. Perhaps this can be mined from the Task metadata?

            Show
            krzys Krzysztof Findeisen added a comment - Job metadata should also give some indication of swappable or otherwise non-standard pipeline components (suggested by Ian Sullivan ) used in a run. Perhaps this can be mined from the Task metadata?
            Hide
            afausti Angelo Fausti added a comment - - edited

            I think we can give a try and specify some metadata tags for ap_verify.

            There are metadata that can be added by the pipeline code it self. For the ap_verify running in CI, I currently see:

            {'pipe_tasks.CalibrateTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ImageDifferenceTime.estimator': 'pipe.base.timeMethod', 'ap_association.AssociationTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.CharacterizeImageTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ProcessCcdTime.estimator': 'pipe.base.timeMethod', 'ip_diffim.DipoleFitTime.estimator': 'pipe.base.timeMethod', 'ip_isr.IsrTime.estimator': 'pipe.base.timeMethod', 'meas_algorithms.SourceDetectionTime.estimator': 'pipe.base.timeMethod'}
            

            So you are already familiar with the mechanism used to add those.

            There is also metadata that is captured from the execution context, and thus available at run time only. We call those "environment metadata". Since ap_verify is running in CI, we are capturing these environment metadata right after the execution is done, and before sending the verification job to SQuaSH.

            For example:

            {'env_name': jenkins, 'ci_id': 4, 'ci_name': ap_verify,  'ci_dataset': CI-HiTS2015, 'ci_url': "https://ci.lsst.codes/job/scipipe/job/ap_verify/4/", 'date': "08/10/2018 05:30:35", 'packages': [ ] }
            
            

            Following discussions with Krzysztof Findeisen at #ap-prototype-pipeline I am adding to those visit and ccddum which allow us to store the metric values computed per CCD.

            Show
            afausti Angelo Fausti added a comment - - edited I think we can give a try and specify some metadata tags for ap_verify . There are metadata that can be added by the pipeline code it self. For the ap_verify running in CI, I currently see: {'pipe_tasks.CalibrateTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ImageDifferenceTime.estimator': 'pipe.base.timeMethod', 'ap_association.AssociationTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.CharacterizeImageTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ProcessCcdTime.estimator': 'pipe.base.timeMethod', 'ip_diffim.DipoleFitTime.estimator': 'pipe.base.timeMethod', 'ip_isr.IsrTime.estimator': 'pipe.base.timeMethod', 'meas_algorithms.SourceDetectionTime.estimator': 'pipe.base.timeMethod'} So you are already familiar with the mechanism used to add those. There is also metadata that is captured from the execution context, and thus available at run time only. We call those "environment metadata". Since ap_verify is running in CI, we are capturing these environment metadata right after the execution is done, and before sending the verification job to SQuaSH. For example: {'env_name': jenkins, 'ci_id': 4, 'ci_name': ap_verify, 'ci_dataset': CI-HiTS2015, 'ci_url': "https://ci.lsst.codes/job/scipipe/job/ap_verify/4/", 'date': "08/10/2018 05:30:35", 'packages': [ ] } Following discussions with Krzysztof Findeisen at #ap-prototype-pipeline I am adding to those visit and ccddum which allow us to store the metric values computed per CCD.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            For the record, I'm still uncomfortable with visit and ccdnum/ccd being required as properties of a Job. Requiring that all metrics be calculated as such a fine-grained level will break pretty much the entire ap_verify system as currently planned. In particular, most metrics that characterize source association make no sense in the context of a specific visit.

            Perhaps, if you want this information, it could be included as measurement metadata?

            Show
            krzys Krzysztof Findeisen added a comment - - edited For the record, I'm still uncomfortable with visit and ccdnum / ccd being required as properties of a Job. Requiring that all metrics be calculated as such a fine-grained level will break pretty much the entire ap_verify system as currently planned. In particular, most metrics that characterize source association make no sense in the context of a specific visit. Perhaps, if you want this information, it could be included as measurement metadata?
            Hide
            ebellm Eric Bellm added a comment -

            Rather than adding "visit" and "ccdnum" I suggest you simply add a single "dataId" entry. The content can then vary as appropriate for the metric value in question ("visit/ccdnum", "tract/patch", etc.), and it track the QA WG recommentation that "Metric values should have Butler dataIds."

            Show
            ebellm Eric Bellm added a comment - Rather than adding "visit" and "ccdnum" I suggest you simply add a single "dataId" entry. The content can then vary as appropriate for the metric value in question ("visit/ccdnum", "tract/patch", etc.), and it track the QA WG recommentation that "Metric values should have Butler dataIds."
            Hide
            afausti Angelo Fausti added a comment -

            Good point Eric Bellm I will add a single dataId tag, and yes I agree with Krzysztof Findeisen that this should be measurement metadata.

            Show
            afausti Angelo Fausti added a comment - Good point Eric Bellm I will add a single dataId tag, and yes I agree with Krzysztof Findeisen that this should be measurement metadata.
            Hide
            krzys Krzysztof Findeisen added a comment -

            DM-16333 is blocked by missing metadata, namely filter_name and instrument. It's not yet clear whether these are job- or measurement-level, nor if there are any requirements on the values beyond internal consistency; Joshua Hoblitt said the lack of documentation is being discussed by SQuaRE.

            Show
            krzys Krzysztof Findeisen added a comment - DM-16333 is blocked by missing metadata , namely filter_name and instrument . It's not yet clear whether these are job- or measurement-level, nor if there are any requirements on the values beyond internal consistency; Joshua Hoblitt said the lack of documentation is being discussed by SQuaRE.
            Hide
            krughoff Simon Krughoff added a comment -

            FYI, I'm going to fix this next week if I can find the time.  Follow along on DM-16333.

            Show
            krughoff Simon Krughoff added a comment - FYI, I'm going to fix this next week if I can find the time.  Follow along on  DM-16333 .
            Hide
            krzys Krzysztof Findeisen added a comment -

            I think this ticket is obsolete, given DM-16333 and DM-16016. If we want more metadata than is currently provided, we should open tickets for those specific changes.

            Show
            krzys Krzysztof Findeisen added a comment - I think this ticket is obsolete, given DM-16333 and DM-16016 . If we want more metadata than is currently provided, we should open tickets for those specific changes.

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Watchers:
              Angelo Fausti, Eric Bellm, Joshua Hoblitt, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.