Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11321

Support job metadata in verify_ap

    XMLWordPrintable

Details

    • 6
    • Alert Production F17 - 8, Alert Production F17 - 9
    • Alert Production

    Description

      When we begin analyzing metrics for verify_ap, those metrics will need to have metadata on the job being run. This could include the dataset/camera/filter IDs (see SQR-019), provenance and pipeline version (see Community post), and information about the server running the pipeline (requested on #dm-squash to control for environmental effects on performance metrics).

      This ticket is to decide a coherent strategy for delivering metadata to verify_ap's verify.Job object, and then adding the necessary infrastructure to verify.ap.metrics. We may need multiple sources of information; for example, reporting camera IDs could naturally be a feature of the verify.ap.Dataset class, and ultimately read from config files, while server information should be foolproof to keep arbitrary users from masquerading as official trial runs.

      Attachments

        Issue Links

          Activity

            No builds found.
            krzys Krzysztof Findeisen created issue -
            krzys Krzysztof Findeisen made changes -
            Field Original Value New Value
            Epic Link DM-9676 [ 30778 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue relates to DM-11118 [ DM-11118 ]
            krzys Krzysztof Findeisen made changes -
            Sprint Alert Production F17 - 8 [ 632 ]
            krzys Krzysztof Findeisen made changes -
            Assignee Krzysztof Findeisen [ krzys ]

            Job metadata should also give some indication of swappable or otherwise non-standard pipeline components (suggested by sullivan) used in a run. Perhaps this can be mined from the Task metadata?

            krzys Krzysztof Findeisen added a comment - Job metadata should also give some indication of swappable or otherwise non-standard pipeline components (suggested by sullivan ) used in a run. Perhaps this can be mined from the Task metadata?
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 8 [ 632 ] Alert Production F17 - 8, Alert Production F17 - 9 [ 632, 639 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Sprint Alert Production F17 - 8, Alert Production F17 - 9 [ 632, 639 ] Alert Production F17 - 8 [ 632 ]
            krzys Krzysztof Findeisen made changes -
            Rank Ranked lower
            krzys Krzysztof Findeisen made changes -
            Sprint Alert Production F17 - 8 [ 632 ] Alert Production F17 - 8, Alert Production F17 - 9 [ 632, 639 ]
            krzys Krzysztof Findeisen made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Rank Ranked lower
            swinbank John Swinbank made changes -
            Sprint Alert Production F17 - 8, Alert Production F17 - 9 [ 632, 639 ] Alert Production F17 - 8, Alert Production F17 - 9, Alert Production F17 - 10 [ 632, 639, 643 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Sprint Alert Production F17 - 8, Alert Production F17 - 9, Alert Production F17 - 10 [ 632, 639, 643 ] Alert Production F17 - 8, Alert Production F17 - 9, Alert Production F17 - 11 [ 632, 639, 644 ]
            krzys Krzysztof Findeisen made changes -
            Rank Ranked lower
            krzys Krzysztof Findeisen made changes -
            Sprint Alert Production F17 - 8, Alert Production F17 - 9, Alert Production F17 - 11 [ 632, 639, 644 ] Alert Production F17 - 8, Alert Production F17 - 9 [ 632, 639 ]
            swinbank John Swinbank made changes -
            Epic Link DM-9676 [ 30778 ] DM-12711 [ 36308 ]
            swinbank John Swinbank made changes -
            Rank Ranked lower
            krzys Krzysztof Findeisen made changes -
            Rank Ranked higher
            krzys Krzysztof Findeisen made changes -
            Rank Ranked lower
            krzys Krzysztof Findeisen made changes -
            Risk Score 0
            swinbank John Swinbank made changes -
            Epic Link DM-12711 [ 36308 ] DM-14431 [ 80156 ]
            afausti Angelo Fausti added a comment - - edited

            I think we can give a try and specify some metadata tags for ap_verify.

            There are metadata that can be added by the pipeline code it self. For the ap_verify running in CI, I currently see:

            {'pipe_tasks.CalibrateTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ImageDifferenceTime.estimator': 'pipe.base.timeMethod', 'ap_association.AssociationTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.CharacterizeImageTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ProcessCcdTime.estimator': 'pipe.base.timeMethod', 'ip_diffim.DipoleFitTime.estimator': 'pipe.base.timeMethod', 'ip_isr.IsrTime.estimator': 'pipe.base.timeMethod', 'meas_algorithms.SourceDetectionTime.estimator': 'pipe.base.timeMethod'}
            

            So you are already familiar with the mechanism used to add those.

            There is also metadata that is captured from the execution context, and thus available at run time only. We call those "environment metadata". Since ap_verify is running in CI, we are capturing these environment metadata right after the execution is done, and before sending the verification job to SQuaSH.

            For example:

            {'env_name': jenkins, 'ci_id': 4, 'ci_name': ap_verify,  'ci_dataset': CI-HiTS2015, 'ci_url': "https://ci.lsst.codes/job/scipipe/job/ap_verify/4/", 'date': "08/10/2018 05:30:35", 'packages': [ ] }
            
            

            Following discussions with krzys at #ap-prototype-pipeline I am adding to those visit and ccddum which allow us to store the metric values computed per CCD.

            afausti Angelo Fausti added a comment - - edited I think we can give a try and specify some metadata tags for ap_verify . There are metadata that can be added by the pipeline code it self. For the ap_verify running in CI, I currently see: {'pipe_tasks.CalibrateTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ImageDifferenceTime.estimator': 'pipe.base.timeMethod', 'ap_association.AssociationTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.CharacterizeImageTime.estimator': 'pipe.base.timeMethod', 'pipe_tasks.ProcessCcdTime.estimator': 'pipe.base.timeMethod', 'ip_diffim.DipoleFitTime.estimator': 'pipe.base.timeMethod', 'ip_isr.IsrTime.estimator': 'pipe.base.timeMethod', 'meas_algorithms.SourceDetectionTime.estimator': 'pipe.base.timeMethod'} So you are already familiar with the mechanism used to add those. There is also metadata that is captured from the execution context, and thus available at run time only. We call those "environment metadata". Since ap_verify is running in CI, we are capturing these environment metadata right after the execution is done, and before sending the verification job to SQuaSH. For example: {'env_name': jenkins, 'ci_id': 4, 'ci_name': ap_verify, 'ci_dataset': CI-HiTS2015, 'ci_url': "https://ci.lsst.codes/job/scipipe/job/ap_verify/4/", 'date': "08/10/2018 05:30:35", 'packages': [ ] } Following discussions with krzys at #ap-prototype-pipeline I am adding to those visit and ccddum which allow us to store the metric values computed per CCD.
            afausti Angelo Fausti made changes -
            Watchers Angelo Fausti, Eric Bellm, Krzysztof Findeisen, Meredith Rawls [ Angelo Fausti, Eric Bellm, Krzysztof Findeisen, Meredith Rawls ] Angelo Fausti, Eric Bellm, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff [ Angelo Fausti, Eric Bellm, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff ]
            krzys Krzysztof Findeisen added a comment - - edited

            For the record, I'm still uncomfortable with visit and ccdnum/ccd being required as properties of a Job. Requiring that all metrics be calculated as such a fine-grained level will break pretty much the entire ap_verify system as currently planned. In particular, most metrics that characterize source association make no sense in the context of a specific visit.

            Perhaps, if you want this information, it could be included as measurement metadata?

            krzys Krzysztof Findeisen added a comment - - edited For the record, I'm still uncomfortable with visit and ccdnum / ccd being required as properties of a Job. Requiring that all metrics be calculated as such a fine-grained level will break pretty much the entire ap_verify system as currently planned. In particular, most metrics that characterize source association make no sense in the context of a specific visit. Perhaps, if you want this information, it could be included as measurement metadata?
            ebellm Eric Bellm added a comment -

            Rather than adding "visit" and "ccdnum" I suggest you simply add a single "dataId" entry. The content can then vary as appropriate for the metric value in question ("visit/ccdnum", "tract/patch", etc.), and it track the QA WG recommentation that "Metric values should have Butler dataIds."

            ebellm Eric Bellm added a comment - Rather than adding "visit" and "ccdnum" I suggest you simply add a single "dataId" entry. The content can then vary as appropriate for the metric value in question ("visit/ccdnum", "tract/patch", etc.), and it track the QA WG recommentation that "Metric values should have Butler dataIds."

            Good point ebellm I will add a single dataId tag, and yes I agree with krzys that this should be measurement metadata.

            afausti Angelo Fausti added a comment - Good point ebellm I will add a single dataId tag, and yes I agree with krzys that this should be measurement metadata.
            krzys Krzysztof Findeisen made changes -
            Link This issue blocks DM-16333 [ DM-16333 ]

            DM-16333 is blocked by missing metadata, namely filter_name and instrument. It's not yet clear whether these are job- or measurement-level, nor if there are any requirements on the values beyond internal consistency; jhoblitt said the lack of documentation is being discussed by SQuaRE.

            krzys Krzysztof Findeisen added a comment - DM-16333 is blocked by missing metadata , namely filter_name and instrument . It's not yet clear whether these are job- or measurement-level, nor if there are any requirements on the values beyond internal consistency; jhoblitt said the lack of documentation is being discussed by SQuaRE.

            FYI, I'm going to fix this next week if I can find the time.  Follow along on DM-16333.

            krughoff Simon Krughoff (Inactive) added a comment - FYI, I'm going to fix this next week if I can find the time.  Follow along on  DM-16333 .
            swinbank John Swinbank made changes -
            Epic Link DM-14431 [ 80156 ] DM-16713 [ 235321 ]

            I think this ticket is obsolete, given DM-16333 and DM-16016. If we want more metadata than is currently provided, we should open tickets for those specific changes.

            krzys Krzysztof Findeisen added a comment - I think this ticket is obsolete, given DM-16333 and DM-16016 . If we want more metadata than is currently provided, we should open tickets for those specific changes.
            krzys Krzysztof Findeisen made changes -
            Resolution Done [ 10000 ]
            Status To Do [ 10001 ] Invalid [ 11005 ]

            People

              krzys Krzysztof Findeisen
              krzys Krzysztof Findeisen
              Angelo Fausti, Eric Bellm, Joshua Hoblitt, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.