# Update packaging layout for ap_verify

## Details

## Description

Work on DM-11371 illustrated some minor incompatibilities between the directory structure expected for datasets living in /datasets and our packaging scheme defined in DM-11116. This ticket is to harmonize the ap_verify packaging as much as possible with the common datasets format.

## Activity

Eric Bellm created issue -
David Reiss added a comment - - edited

For reference, the datasets format is described here.

The existing format is found in the ap_verify_dataset_template repo. The only real instantiation of this dataset repo is the ap_verify_hits2015.

The primary (only?) updates that I see as necessary are:

1. we need an ingested butler repo directory. Our current repo is called data. We will rename data to repo and perform the ingestion in /datasets
2. rename ref_cats to refcats and un-tarball them each into gaia or ps1 subdirectories. However I like having them in a tarball for the ap_verify datasets since that probably works better with git-lfs.
3. no templates directory (or can we keep it?)
4. currently raw is a flat directory. We will add subdirectories for the survey name(s) (for the HiTS data, either just HiTS or HiTS2014/2015)

David Reiss added a comment - - edited For reference, the datasets format is described here . The existing format is found in the ap_verify_dataset_template repo. The only real instantiation of this dataset repo is the ap_verify_hits2015 . The primary (only?) updates that I see as necessary are: 1. we need an ingested butler repo directory. Our current repo is called data . We will rename data to repo and perform the ingestion in /datasets 2. rename ref_cats to refcats and un-tarball them each into gaia or ps1 subdirectories. However I like having them in a tarball for the ap_verify datasets since that probably works better with git-lfs . 3. no templates directory (or can we keep it?) 4. currently raw is a flat directory. We will add subdirectories for the survey name(s) (for the HiTS data, either just HiTS or HiTS2014/2015 )
Eric Bellm added a comment - - edited

We need to have the templates somewhere; that's what breaks the datasets format. Maybe we could put it under PREPROCESSED/templates?

Eric Bellm added a comment - - edited We need to have the templates somewhere; that's what breaks the datasets format. Maybe we could put it under PREPROCESSED/templates ?
David Reiss added a comment -

I would think that should be fine.

David Reiss added a comment - I would think that should be fine.
 Description Work on DM-11371 illustrated some minor incompatibilities between the directory structure expected for datasets living in /datasets and our packaging scheme defined in DM-11116. This ticket is to harmonize the ap_verify packaging as much as possible with the common datasets format. Work on DM-11371 illustrated some minor incompatibilities between the directory structure expected for datasets living in {{/datasets}} and our packaging scheme defined in DM-11116. This ticket is to harmonize the {{ap_verify}} packaging as much as possible with the common datasets format.
David Reiss added a comment -

Apparently rerun is the appropriate subdirectory for templates. I will use that here as well.

David Reiss added a comment - Apparently rerun is the appropriate subdirectory for templates. I will use that here as well.
David Reiss added a comment - - edited

Meredith Rawls Does this plan sound good? We need to coordinate it with ap_pipes – the steps I will take are:

1. rename data to repo
2. rename ref_cats to refcats
3. rename templates to rerun

I will keep the refcats in tarball format for better handling by git-lfs.
I will keep the calib directory for master calibs.

I will add a README for the refcats describing their provenance as suggested in RFC-372.
I will update the README to remove the statement that the raws are calibrated.

I will make corresponding changes to both ap_verify_hits2015 and ap_verify_dataset_template.

David Reiss added a comment - - edited Meredith Rawls Does this plan sound good? We need to coordinate it with ap_pipes – the steps I will take are: 1. rename data to repo 2. rename ref_cats to refcats 3. rename templates to rerun I will keep the refcats in tarball format for better handling by git-lfs . I will keep the calib directory for master calibs. I will add a README for the refcats describing their provenance as suggested in RFC-372 . I will update the README to remove the statement that the raws are calibrated. I will make corresponding changes to both ap_verify_hits2015 and ap_verify_dataset_template .
Meredith Rawls added a comment -

I am fine with (1) and (2), and the associated refcats updates you describe. Please also edit the README to clarify that raw images are not actually photometrically and astrometrically calibrated.

I find (3) very counterintuitive, but it seems like you're trying to comply with our Common Dataset Organization and Policy for the git-lfs "datasets" as much as possible. So it's fine so long as the README clearly explains that directory is the place where the templates (will) live.

Meredith Rawls added a comment - I am fine with (1) and (2), and the associated refcats updates you describe. Please also edit the README to clarify that raw images are not actually photometrically and astrometrically calibrated. I find (3) very counterintuitive, but it seems like you're trying to comply with our Common Dataset Organization and Policy for the git-lfs "datasets" as much as possible. So it's fine so long as the README clearly explains that directory is the place where the templates (will) live.
David Reiss added a comment -

Oh, (3) was brought on by comments in RFC-372.

I agree with you on (3) and I am not sure that we need to conform to the /datasets specification entirely for our ap_ stuff.

David Reiss added a comment - Oh, (3) was brought on by comments in RFC-372 . I agree with you on (3) and I am not sure that we need to conform to the /datasets specification entirely for our ap_ stuff.
David Reiss added a comment -

Also we should have a separate directory to contain community-processed exposures since (as was pointed out) we should not use the lsst stack to do ISR using community-generated calibs (i.e., we should generate our own).

Any thoughts here?

David Reiss added a comment - Also we should have a separate directory to contain community-processed exposures since (as was pointed out) we should not use the lsst stack to do ISR using community-generated calibs (i.e., we should generate our own). Any thoughts here?
Meredith Rawls added a comment -

The original intent of ap_pipe and friends was never to use community-pipeline (CP) processed exposures, but rather to start with raws and apply some level of ISR as the first step. I agree that applying CP calibs to raw images with LSST tools is not the best final approach, but the goal right now is a verification system, not new science from HiTS. So I don't think we should worry about including CP exposures in our ap_ git-lfs datasets. I think the ultimate solution should be to use the LSST Stack to create master flats and biases, but that's beyond the scope of ap_pipe (and ap_verify) for now.

Meredith Rawls added a comment - The original intent of ap_pipe and friends was never to use community-pipeline (CP) processed exposures, but rather to start with raws and apply some level of ISR as the first step. I agree that applying CP calibs to raw images with LSST tools is not the best final approach, but the goal right now is a verification system, not new science from HiTS. So I don't think we should worry about including CP exposures in our ap_ git-lfs datasets. I think the ultimate solution should be to use the LSST Stack to create master flats and biases, but that's beyond the scope of ap_pipe (and ap_verify ) for now.
David Reiss added a comment -

OK I agree.

David Reiss added a comment - OK I agree.
Krzysztof Findeisen added a comment - - edited

David Reiss the proposed changes need to be reflected in the ap.verify.Dataset class (specifically, the definitions and tests for _stub_input_repo, refcat_location, and template_location).

Krzysztof Findeisen added a comment - - edited David Reiss the proposed changes need to be reflected in the ap.verify.Dataset class (specifically, the definitions and tests for _stub_input_repo , refcat_location , and template_location ).
David Reiss added a comment -

Would you be willing to review this ticket? DM-11371 and RFC-372 are still not fully decided but I don't believe any changes will affect things that were changed here.

Thanks!

David Reiss added a comment - Would you be willing to review this ticket? DM-11371 and RFC-372 are still not fully decided but I don't believe any changes will affect things that were changed here. Thanks!
Krzysztof Findeisen added a comment -

David Reiss, could you please create the pull requests?

Krzysztof Findeisen added a comment - David Reiss , could you please create the pull requests?
David Reiss added a comment -

Sorry – done!

David Reiss added a comment - Sorry – done!
Krzysztof Findeisen added a comment -

A couple of straightforward changes requested. Feel free to merge once those are done.

Krzysztof Findeisen added a comment - A couple of straightforward changes requested. Feel free to merge once those are done.
David Reiss added a comment -

Thanks. Merged and done.

David Reiss added a comment - Thanks. Merged and done.
