Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24527

Update validation_data readmes and scripts about "recreating the repository"

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: validate_drp, Validation
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      AP S20-5 (April), AP S20-6 (May)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      As part of DM-17597, I am re-running processCcd on the validation_data datasets, so that I have newly processed output to feed to jointcal. This has necessitated some digging, as the instructions in the README files are not sufficient to completely regenerate the datasets. I'm tweaking the readmes as I go, and made some changes to the run tests to simplify the process.

      I do not plan to upload my newly processed output, but we could do that on another ticket.

        Attachments

          Issue Links

            Activity

            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            I'd much rather the instructions for re-generation stay in this repo. I'm find if you want to write that out as a script instead of being in the README.

            But I don't like creating a documentation dependency on validate_drp, which is not a dependency of this data repository. The fact that validate_drp has some example scripts is just a convenience to users of validate_drp; it's not meant to be authoritative.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - I'd much rather the instructions for re-generation stay in this repo. I'm find if you want to write that out as a script instead of being in the README. But I don't like creating a documentation dependency on validate_drp , which is not a dependency of this data repository. The fact that validate_drp has some example scripts is just a convenience to users of validate_drp ; it's not meant to be authoritative.
            Hide
            Parejkoj John Parejko added a comment -

            The problem is that the readme and processCcd.sh files were not kept up to date: they still referred to a.net refcats, for example. Since those bash scripts are not used, while the ones in validate_drp are, the latter are much more likely to actually be useable.

            Show
            Parejkoj John Parejko added a comment - The problem is that the readme and processCcd.sh files were not kept up to date: they still referred to a.net refcats, for example. Since those bash scripts are not used, while the ones in validate_drp are, the latter are much more likely to actually be useable.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            The "how-to reproduce" describes what's in the processed data in the repository. I believe the README is correct.

            That may be different than what I believe you desire, which is "how to I process the raw data in this directory with the current version of the stack".

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - The "how-to reproduce" describes what's in the processed data in the repository. I believe the README is correct. That may be different than what I believe you desire, which is "how to I process the raw data in this directory with the current version of the stack".
            Hide
            Parejkoj John Parejko added a comment -

            Ah, yes, that is a fair point (and is related to a comment I made on a testdata_jointcal PR just now). I think this gets to my old question about whether we should keep these data updated regularly to keep up with stack changes (e.g. a.net removal). As it stands, you cannot reprocess the data with the "how-to-reproduce" instructions.

            Show
            Parejkoj John Parejko added a comment - Ah, yes, that is a fair point (and is related to a comment I made on a testdata_jointcal PR just now). I think this gets to my old question about whether we should keep these data updated regularly to keep up with stack changes (e.g. a.net removal). As it stands, you cannot reprocess the data with the "how-to-reproduce" instructions.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            It's certainly the case that when I've updated the processed data in these validation_data repos in the past I've generally done exactly what you suggest. I've taken the processing that I know to work from lsst_ci/scripts or validate_drp/examples and copied the script instructions to the README for the {validation_data_}} repo and run them to generate the new set of curated data.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - It's certainly the case that when I've updated the processed data in these validation_data repos in the past I've generally done exactly what you suggest. I've taken the processing that I know to work from lsst_ci/scripts or validate_drp/examples and copied the script instructions to the README for the {validation_data_}} repo and run them to generate the new set of curated data.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            So I think I disagree with the setup of this ticket, which says it's going to update the README without updating the data. I think you should update both or update neither.

            Removing the "how-to-reproduce" instructions for the current version of the processed data that are stored in the validation_data_* repo doesn't make sense to me and doesn't seem to help anything. If you're going to update the version of the processed data (e.g., for v19 or v20) then definitely also update the README that describes how the data were generated. But if you just refer to (implicitly master) of validate_drp/examples or lsst_ci/scripts then the README documentation will be wrong for how the processed data stored in the repo were generated.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - So I think I disagree with the setup of this ticket, which says it's going to update the README without updating the data. I think you should update both or update neither. Removing the "how-to-reproduce" instructions for the current version of the processed data that are stored in the validation_data_* repo doesn't make sense to me and doesn't seem to help anything. If you're going to update the version of the processed data (e.g., for v19 or v20) then definitely also update the README that describes how the data were generated. But if you just refer to (implicitly master ) of validate_drp/examples or lsst_ci/scripts then the README documentation will be wrong for how the processed data stored in the repo were generated.
            Hide
            Parejkoj John Parejko added a comment -

            I'll make a note here, since I'm not sure where else to put it: after some digging, it is clear that all of visit=176837 and ccd=13 for visit=176846 of validation_data_decam are mangled in a way that consistently results in bad astrometry. I tweaked the astrometry config a few times, but could not get good fits. I think we should probably put something in the readme for that package to point this out to potential users.

            Show
            Parejkoj John Parejko added a comment - I'll make a note here, since I'm not sure where else to put it: after some digging, it is clear that all of visit=176837 and ccd=13 for visit=176846 of validation_data_decam are mangled in a way that consistently results in bad astrometry. I tweaked the astrometry config a few times, but could not get good fits. I think we should probably put something in the readme for that package to point this out to potential users.
            Hide
            Parejkoj John Parejko added a comment -

            Marking this Invalid, as the work I did on DM-32373 to update validation_data_cfht for gen3 has entirely reworked how processing that dataset is handled-there is now a bin/ directory, allowing anyone to run the data, and no supplied pre-processed version- and I've updated the README accordingly.

            Show
            Parejkoj John Parejko added a comment - Marking this Invalid, as the work I did on DM-32373 to update validation_data_cfht for gen3 has entirely reworked how processing that dataset is handled- there is now a bin/ directory, allowing anyone to run the data, and no supplied pre-processed version - and I've updated the README accordingly.

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Watchers:
              Dan Taranu, Eli Rykoff, Hsin-Fang Chiang, John Parejko, Michael Wood-Vasey, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.