Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14328

Publish verification jobs produced by the HSC reprocessing to SQuaSH

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: squash
    • Labels:
      None

      Description

      HSC weekly re processing is now producing verification jobs, located at

       /datasets/hsc/repo/rerun/RC/w_2018_17/DM-14055/validateDrp/matchedVisitMetrics/*/*/*json
      

      This ticket is to start a discussion of what is needed to send those results to SQuaSH.

      • dispatch_verify.py is used to send a verification job to SQuaSH, it has the --env option that can be used to grab information from the environment which is added to the job metatada. That's used by SQuaSH to identify the dataset being processed, the ID of the run, URLs linking to additional information about the run, the stack version used, etc.

      For CI we defined a "jenkins" enviroment, here we need to create another environment option named perhaps ldf. My initial suggestion for the environment variables is:

      • DATASET: name of the database being processed, e.g HSC RC2
      • DATASET_REPO_URL: do we have a git lfs repo for the dataset?
      • RUN_ID : can we use the associated jira ticket to identify this run? following https://confluence.lsstcorp.org/display/DM/Reprocessing+of+the+HSC+RC+dataset it looks like a good idea, is there a better identifier for the runs?
      • RUN_ID_URL: could be the corresponding jira ticket URL
      • VERSION_TAG: the LSST stack version used, e.g w_2017_14

      Once this new environment is created in dispatch_verify.py an example of command line to publish the results to SQuaSH would be:

      $ export DATASET="HSC RC2"
      $ export DATASET_REPO_URL=""
      $ export RUN_ID="DM-10084"
      $ export RUN_ID_URL="https://jira.lsstcorp.org/browse/DM-10084"
      $ export VERSION_TAG="w_2017_14"
       
      $ dispatch_verify.py --url https://squash-restful-api-demo.lsst.codes --user <squash user> --password <squash passwd> --env ldf --lsstsw lsstsw/ output/verify/job.json 
      
      

      Note that the above URL points to a demo instance of SQuaSH so we can test as needed without affecting the production instance.

      Here I am assuming we have an lsstw stack installation and thus access to the manifest file with the versions of the stack packages. We can supress the --lsstsw but it would be useful to carry on this information to compare which stack packages changed from weekly to weekly build.

      • Simon Krughoff mentioned that we have one verification job per patch. I guess, as a start, we could publish result for one patch only? Later, one way to do that would be to add the patch id as a job metadata so that we can distinguish them on SQuaSH. Also, we can use dispatch_verify.py to combine multiple verification jobs into a single JSON if that is required (see https://sqr-019.lsst.io/#Post-processing-verification-jobs) but this wouldn't scale if we process many patches which will be the case.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                afausti Angelo Fausti
                Reporter:
                afausti Angelo Fausti
                Watchers:
                Angelo Fausti, Hsin-Fang Chiang, John Parejko, John Swinbank, Jonathan Sick, Krzysztof Findeisen, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel