Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11575

Update packaging layout for ap_verify

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Science Pipelines
    • Labels:
      None

      Description

      Work on DM-11371 illustrated some minor incompatibilities between the directory structure expected for datasets living in /datasets and our packaging scheme defined in DM-11116. This ticket is to harmonize the ap_verify packaging as much as possible with the common datasets format.

        Attachments

          Issue Links

            Activity

            Hide
            reiss David Reiss added a comment - - edited

            For reference, the datasets format is described here.

            The existing format is found in the ap_verify_dataset_template repo. The only real instantiation of this dataset repo is the ap_verify_hits2015.

            The primary (only?) updates that I see as necessary are:

            1. we need an ingested butler repo directory. Our current repo is called data. We will rename data to repo and perform the ingestion in /datasets
            2. rename ref_cats to refcats and un-tarball them each into gaia or ps1 subdirectories. However I like having them in a tarball for the ap_verify datasets since that probably works better with git-lfs.
            3. no templates directory (or can we keep it?)
            4. currently raw is a flat directory. We will add subdirectories for the survey name(s) (for the HiTS data, either just HiTS or HiTS2014/2015)

            Show
            reiss David Reiss added a comment - - edited For reference, the datasets format is described here . The existing format is found in the ap_verify_dataset_template repo. The only real instantiation of this dataset repo is the ap_verify_hits2015 . The primary (only?) updates that I see as necessary are: 1. we need an ingested butler repo directory. Our current repo is called data . We will rename data to repo and perform the ingestion in /datasets 2. rename ref_cats to refcats and un-tarball them each into gaia or ps1 subdirectories. However I like having them in a tarball for the ap_verify datasets since that probably works better with git-lfs . 3. no templates directory (or can we keep it?) 4. currently raw is a flat directory. We will add subdirectories for the survey name(s) (for the HiTS data, either just HiTS or HiTS2014/2015 )
            Hide
            ebellm Eric Bellm added a comment - - edited

            We need to have the templates somewhere; that's what breaks the datasets format. Maybe we could put it under PREPROCESSED/templates?

            Show
            ebellm Eric Bellm added a comment - - edited We need to have the templates somewhere; that's what breaks the datasets format. Maybe we could put it under PREPROCESSED/templates ?
            Hide
            reiss David Reiss added a comment -

            I would think that should be fine.

            Show
            reiss David Reiss added a comment - I would think that should be fine.
            Hide
            reiss David Reiss added a comment -

            Apparently rerun is the appropriate subdirectory for templates. I will use that here as well.

            Show
            reiss David Reiss added a comment - Apparently rerun is the appropriate subdirectory for templates. I will use that here as well.
            Hide
            reiss David Reiss added a comment - - edited

            Meredith Rawls Does this plan sound good? We need to coordinate it with ap_pipes – the steps I will take are:

            1. rename data to repo
            2. rename ref_cats to refcats
            3. rename templates to rerun

            I will keep the refcats in tarball format for better handling by git-lfs.
            I will keep the calib directory for master calibs.

            I will add a README for the refcats describing their provenance as suggested in RFC-372.
            I will update the README to remove the statement that the raws are calibrated.

            I will make corresponding changes to both ap_verify_hits2015 and ap_verify_dataset_template.

            Show
            reiss David Reiss added a comment - - edited Meredith Rawls Does this plan sound good? We need to coordinate it with ap_pipes – the steps I will take are: 1. rename data to repo 2. rename ref_cats to refcats 3. rename templates to rerun I will keep the refcats in tarball format for better handling by git-lfs . I will keep the calib directory for master calibs. I will add a README for the refcats describing their provenance as suggested in RFC-372 . I will update the README to remove the statement that the raws are calibrated. I will make corresponding changes to both ap_verify_hits2015 and ap_verify_dataset_template .
            Hide
            mrawls Meredith Rawls added a comment -

            I am fine with (1) and (2), and the associated refcats updates you describe. Please also edit the README to clarify that raw images are not actually photometrically and astrometrically calibrated.

            I find (3) very counterintuitive, but it seems like you're trying to comply with our Common Dataset Organization and Policy for the git-lfs "datasets" as much as possible. So it's fine so long as the README clearly explains that directory is the place where the templates (will) live.

            Show
            mrawls Meredith Rawls added a comment - I am fine with (1) and (2), and the associated refcats updates you describe. Please also edit the README to clarify that raw images are not actually photometrically and astrometrically calibrated. I find (3) very counterintuitive, but it seems like you're trying to comply with our Common Dataset Organization and Policy for the git-lfs "datasets" as much as possible. So it's fine so long as the README clearly explains that directory is the place where the templates (will) live.
            Hide
            reiss David Reiss added a comment -

            Oh, (3) was brought on by comments in RFC-372.

            I agree with you on (3) and I am not sure that we need to conform to the /datasets specification entirely for our ap_ stuff.

            Show
            reiss David Reiss added a comment - Oh, (3) was brought on by comments in RFC-372 . I agree with you on (3) and I am not sure that we need to conform to the /datasets specification entirely for our ap_ stuff.
            Hide
            reiss David Reiss added a comment -

            Also we should have a separate directory to contain community-processed exposures since (as was pointed out) we should not use the lsst stack to do ISR using community-generated calibs (i.e., we should generate our own).

            Any thoughts here?

            Show
            reiss David Reiss added a comment - Also we should have a separate directory to contain community-processed exposures since (as was pointed out) we should not use the lsst stack to do ISR using community-generated calibs (i.e., we should generate our own). Any thoughts here?
            Hide
            mrawls Meredith Rawls added a comment -

            The original intent of ap_pipe and friends was never to use community-pipeline (CP) processed exposures, but rather to start with raws and apply some level of ISR as the first step. I agree that applying CP calibs to raw images with LSST tools is not the best final approach, but the goal right now is a verification system, not new science from HiTS. So I don't think we should worry about including CP exposures in our ap_ git-lfs datasets. I think the ultimate solution should be to use the LSST Stack to create master flats and biases, but that's beyond the scope of ap_pipe (and ap_verify) for now.

            Show
            mrawls Meredith Rawls added a comment - The original intent of ap_pipe and friends was never to use community-pipeline (CP) processed exposures, but rather to start with raws and apply some level of ISR as the first step. I agree that applying CP calibs to raw images with LSST tools is not the best final approach, but the goal right now is a verification system, not new science from HiTS. So I don't think we should worry about including CP exposures in our ap_ git-lfs datasets. I think the ultimate solution should be to use the LSST Stack to create master flats and biases, but that's beyond the scope of ap_pipe (and ap_verify ) for now.
            Hide
            reiss David Reiss added a comment -

            OK I agree.

            Show
            reiss David Reiss added a comment - OK I agree.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            David Reiss the proposed changes need to be reflected in the ap.verify.Dataset class (specifically, the definitions and tests for _stub_input_repo, refcat_location, and template_location).

            Show
            krzys Krzysztof Findeisen added a comment - - edited David Reiss the proposed changes need to be reflected in the ap.verify.Dataset class (specifically, the definitions and tests for _stub_input_repo , refcat_location , and template_location ).
            Hide
            reiss David Reiss added a comment -

            Would you be willing to review this ticket? DM-11371 and RFC-372 are still not fully decided but I don't believe any changes will affect things that were changed here.

            Thanks!

            Show
            reiss David Reiss added a comment - Would you be willing to review this ticket? DM-11371 and RFC-372 are still not fully decided but I don't believe any changes will affect things that were changed here. Thanks!
            Hide
            krzys Krzysztof Findeisen added a comment -

            David Reiss, could you please create the pull requests?

            Show
            krzys Krzysztof Findeisen added a comment - David Reiss , could you please create the pull requests?
            Hide
            reiss David Reiss added a comment -

            Sorry – done!

            Show
            reiss David Reiss added a comment - Sorry – done!
            Hide
            krzys Krzysztof Findeisen added a comment -

            A couple of straightforward changes requested. Feel free to merge once those are done.

            Show
            krzys Krzysztof Findeisen added a comment - A couple of straightforward changes requested. Feel free to merge once those are done.
            Hide
            reiss David Reiss added a comment -

            Thanks. Merged and done.

            Show
            reiss David Reiss added a comment - Thanks. Merged and done.

              People

              • Assignee:
                reiss David Reiss
                Reporter:
                ebellm Eric Bellm
                Reviewers:
                Krzysztof Findeisen
                Watchers:
                David Reiss, Eric Bellm, Krzysztof Findeisen, Meredith Rawls
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel