# Script for looking at ap_verify metrics

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
6
• Sprint:
AP F22-1 (June)
• Team:
• Urgent?:
No

#### Description

To help us determine whether our performance changes to the AP pipeline cause unwanted output changes, we need to be able to compare the metric values output from ap_verify runs on ci_hits2015 and ci_cosmos_pdr2 to those in Chronograph. This ticket is to make some simple scripts to read in the metrics and dump them to the console or make them easier to read in a notebook.

#### Activity

Hide
Krzysztof Findeisen added a comment - - edited

Feedback on the CLI:

• Why is a collection mandatory? This is different from the behavior of the butler tool, where queries that don't specify a collection are understood to search everything.
• --data_id_restriction sounds like it would be a useful option, though given the warning I haven't tried testing it.
• I don't understand the --kind argument at all. Metrics are metrics; how do you decide whether something is a "value" or a "timing"? The majority of metrics are likely to be neither timing nor memory (and this is already true for faro); do you plan to split up the "value" category in the future?
• It's not clear that omitting --kind is equivalent to --kind value instead of providing an unfiltered view. This could potentially lead the user into thinking that timing metrics aren't being measured at all, and cause them to waste time trying to track down a bug that doesn't exist.
• It would be helpful if there were a space between the blocks for each data ID; as it is, it's hard to tell at a glance how many values exist for how many IDs.
• The help text for --verbose should probably read "Print extra information when loading metric values or handling errors."
• The log given with the --verbose option is incorrect; it reads Loaded 18 metrics for 3 dataIds. when run on 6 data IDs (also, strictly speaking it's 18 values of 3 different metrics).
• Why is the output format different with and without --data_id_keys?

 {instrument: 'DECam', detector: 5, visit: 411420, ...}; ap_association.fracUpdatedDiaObjects: 0.17993079584775087 visit: 411420, detector: 5; ap_association.fracUpdatedDiaObjects: 0.17993079584775087 

• It's not clear which way the difference is measured, and in fact the order is the opposite of what I'd expect from programs like diff. Maybe change the header to read something like "Change in <repo1> relative to <repo2>"?
Show
Hide
Krzysztof Findeisen added a comment - - edited

An alternative way of presenting the data ID information might be to provide the data ID in a header; this would split up the output visually, and avoid wrapping problems with long.package.veryLongAndPreciselyVerboseDescriptiveMetrics. Or metrics with both tract and detector info, as might happen when characterizing fakes.

Show
Krzysztof Findeisen added a comment - - edited An alternative way of presenting the data ID information might be to provide the data ID in a header; this would split up the output visually, and avoid wrapping problems with long.package.veryLongAndPreciselyVerboseDescriptiveMetrics . Or metrics with both tract and detector info, as might happen when characterizing fakes.
Hide
John Parejko added a comment -

Collection is mandatory because of the way I look up the data to search. I use registry.queryDataIds(("detector", "visit"), datasets="calexp") to get the dataIds of interest, which requires a collection. Suggestions for alternative approaches are welcome (maybe when you review the code itself?).

data_id_restriction is available as a kwarg in the print function, I just don't know how to implement it on the cmdline.

I wrote this for ap_verify, as that's what I needed it for. That has metricvalue_blah, metricvalue_blahTime, and metricvalue_blahMemory. The "values" option is for the former (i.e., everything that isn't Time or Memory); I don't know what else to call it. I don't know what future expansions of this tool might be wanted in the future, I just wanted to make the thing I wrote for ap_verify useable by others.

I've added default='value' to the --kind option; I had thought it included the defaults in the cmdline help, but it doesn't.

I refactored the output to be in per-dataId blocks, with the dataID as a "header". I think you're right that it is more interpretable that way.

Good catch on the verbose message re: number of dataIds: that was a bug in the code. I've added the number of metrics to that, and tweaked the message.

Without --data_id_keys, I just print the DataCoordinates directly, since I don't know what they might contain. That results in the dict-like output. I don't know if it needs to be kept consistent with a user-specified list, and doing so seems like it might be a pain.

I took your wording re: the difference, but I'm not sure whether I should be printing repo2-repo1 or repo1-repo2 (the latter is what's implemented).

Show
John Parejko added a comment - Collection is mandatory because of the way I look up the data to search. I use registry.queryDataIds(("detector", "visit"), datasets="calexp") to get the dataIds of interest, which requires a collection. Suggestions for alternative approaches are welcome (maybe when you review the code itself?). data_id_restriction is available as a kwarg in the print function, I just don't know how to implement it on the cmdline. I wrote this for ap_verify, as that's what I needed it for. That has metricvalue_blah, metricvalue_blahTime, and metricvalue_blahMemory. The "values" option is for the former (i.e., everything that isn't Time or Memory); I don't know what else to call it. I don't know what future expansions of this tool might be wanted in the future, I just wanted to make the thing I wrote for ap_verify useable by others. I've added default='value' to the --kind option; I had thought it included the defaults in the cmdline help, but it doesn't. I refactored the output to be in per-dataId blocks, with the dataID as a "header". I think you're right that it is more interpretable that way. Good catch on the verbose message re: number of dataIds: that was a bug in the code. I've added the number of metrics to that, and tweaked the message. Without --data_id_keys , I just print the DataCoordinates directly, since I don't know what they might contain. That results in the dict-like output. I don't know if it needs to be kept consistent with a user-specified list, and doing so seems like it might be a pain. I took your wording re: the difference, but I'm not sure whether I should be printing repo2-repo1 or repo1-repo2 (the latter is what's implemented).
Hide
Krzysztof Findeisen added a comment -

Collection is mandatory because of the way I look up the data to search. I use registry.queryDataIds(("detector", "visit"), datasets="calexp") to get the dataIds of interest, which requires a collection. Suggestions for alternative approaches are welcome (maybe when you review the code itself?).

Looks like the way butler does it is pretty ugly: https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/script/queryDatasets.py#L160-L168. You can always pass a list of all (run?) collections to the Butler constructor.

I just wanted to make the thing I wrote for ap_verify useable by others.

I still don't understand why it's useful to need three calls to see all the metrics.

Show
Krzysztof Findeisen added a comment - Collection is mandatory because of the way I look up the data to search. I use registry.queryDataIds(("detector", "visit"), datasets="calexp") to get the dataIds of interest, which requires a collection. Suggestions for alternative approaches are welcome (maybe when you review the code itself?). Looks like the way butler does it is pretty ugly: https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/script/queryDatasets.py#L160-L168 . You can always pass a list of all (run?) collections to the Butler constructor. I just wanted to make the thing I wrote for ap_verify useable by others. I still don't understand why it's useful to need three calls to see all the metrics.
Hide
John Parejko added a comment -
Show
John Parejko added a comment - Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36824/pipeline

#### People

Assignee:
John Parejko
Reporter:
John Parejko
Reviewers:
Krzysztof Findeisen
Watchers:
Eric Bellm, John Parejko, Krzysztof Findeisen, Meredith Rawls