Type: Technical task
Fix Version/s: None
Sprint:AP S22-6 (May)
Identify a larger dataset from DC2 for bulk timing tests.
Choose these from the the 5.5 year depth DC2 tract 4431 used by Kenneth Herner (e.g. in https://jira.lsstcorp.org/browse/PREOPS-631).
We should focus on many visits instead of many detectors. Simon's work creating the rc2_subset HSC CI package could be a useful reference.
- relates to
DM-36026 Reprocess the DC2 AP subset
I created a TAGGED collection u/mrawls/
DM-34827/raw/4patch_4431 with the datasets in the attached cdv file.
I created a CHAINED collection u/mrawls/
DM-34827/defaults/4patch_4431 containing the above as well as calibs, skymaps, and refcats.
The script I used to do this is saved in /project/mrawls/ap_profile/tagged_chained_collections_create.py.
It doesn't really matter now, but I'd have opened the file with `astropy.table.Table.read()`; then you wouldn't have to parse the columns and could access them by name. Why did you use `u/parejkoj/profiling-
DM-34825` as the collection to draw the raws from in your queryDatasets call, instead of Ken's `u/kherner/2.2i/runs/tract4431-w40`?
For the record, the idea is "make a list of dataRefs, pass them to butler.registry.associate(collections, dataRefs) to tag them, call butler collection-chain to chain them with the various calibs etc.".
Want to set off a run in the morning on
DM-34828, with the ordering fix Krzysztof gave you, just to see how it goes?
I did basic string parsing since it was what I could remember how to do offhand, but good point re Astropy for next time. I used that collection since it chained along Ken's larger collection, so it was functionally equivalent. Thanks!
Bonus afterthought - because there are zillions of miscellaneous datasets in Ken's run, I made a second TAGGED collection that contains just the goodSeeingCoadds we want to use as templates here. The procedure I followed is also in the tagged_chained_collections_create.py script mentioned previously.
This collection is called u/mrawls/
DM-34827/coadd/4patch_4431 and is intentionally NOT chained into the defaults, in case we ever want to build better templates.
The idea is that an ApPipe run can use just "defaults" and "coadd" as the two input collections.
Excellent, let's adopt those 272 datasets as our larger, uh, dataset. The fact that they all fully reside inside those four patches is excellent because it gives us good spatial overlap for testing association.
We should think about where we want your "find calexps in patches" script to land, because it's very useful for AP.
Since the 272 visit+detectors are quite the hodgepodge due to all the dithering, we should probably manually create a TAGGED collection in /repo/dc2 once instead of defining a long list of datasets each time. This will require an appropriate and/or clever name. I don't know how to create a TAGGED collection like this offhand, but I know it's possible; for example, Jim created "hits2014" and "hits2015" tagged collections in /repo/main/DECam for me.