Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12043

Run pipe_analysis scripts on PDR1 data

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      6
    • Epic Link:
    • Team:
      Data Release Production

      Description

      In order to explore using the interactive QA plots at an interesting scale, run the pipe_analysis scripts on the full PDR1 dataset at lsst-dev. Most importantly these should create the parquet tables (DM-12030), but they can also make the static plots.

        Attachments

          Issue Links

            Activity

            Hide
            tmorton Tim Morton added a comment -

            Thanks John Parejko-- I've tried the subregion thing and gotten that working, but it doesn't seem to be any faster for some reason.

            Show
            tmorton Tim Morton added a comment - Thanks John Parejko -- I've tried the subregion thing and gotten that working, but it doesn't seem to be any faster for some reason.
            Hide
            Parejkoj John Parejko added a comment -

            See my attachment on DM-11785 for my approach. I use Hsin-Fang Chiang's sqlite3 files to extract the visits for each tract+filter that I'm interested in, and then use that to spawn slurm jobs.

            https://jira.lsstcorp.org/secure/attachment/30082/jointcal-process.py

            The relevant code for you would be something like this (though you do have to know which of UDEEP, DEEP, WIDE, or Aegis your chosen tract is in, but there might be a way to merge those):

            import sqlite3
            import os
            sqlitedir = '/scratch/hchiang2/parejko/'
            conn = sqlite3.connect(os.path.join(sqlitedir, 'dbDEEP.sqlite3'))
            cursor = conn.cursor()
            cmd = "select distinct visit from calexp where tract=:tract and filter=:filt"
            tract=9813
            filt='HSC-Z'
            cursor.execute(cmd, dict(tract=tract, filt=filt))
            result = cursor.fetchall()
            # NOTE: have to flatten the list, as it comes out as [(1,), (2,), (3,), ...]
            print('^'.join(str(x[0]) for x in result))
            

            Show
            Parejkoj John Parejko added a comment - See my attachment on DM-11785 for my approach. I use Hsin-Fang Chiang 's sqlite3 files to extract the visits for each tract+filter that I'm interested in, and then use that to spawn slurm jobs. https://jira.lsstcorp.org/secure/attachment/30082/jointcal-process.py The relevant code for you would be something like this (though you do have to know which of UDEEP, DEEP, WIDE, or Aegis your chosen tract is in, but there might be a way to merge those): import sqlite3 import os sqlitedir = '/scratch/hchiang2/parejko/' conn = sqlite3.connect(os.path.join(sqlitedir, 'dbDEEP.sqlite3')) cursor = conn.cursor() cmd = "select distinct visit from calexp where tract=:tract and filter=:filt" tract=9813 filt='HSC-Z' cursor.execute(cmd, dict(tract=tract, filt=filt)) result = cursor.fetchall() # NOTE: have to flatten the list, as it comes out as [(1,), (2,), (3,), ...] print('^'.join(str(x[0]) for x in result))
            Hide
            tmorton Tim Morton added a comment -

            Running all the visit-level scripts is currently blocked by the fact that the slurm cluster schedules an entire node per job, meaning launching a job array to run all 747 jobs would badly clutter up the cluster.

            Show
            tmorton Tim Morton added a comment - Running all the visit-level scripts is currently blocked by the fact that the slurm cluster schedules an entire node per job, meaning launching a job array to run all 747 jobs would badly clutter up the cluster.
            Hide
            tmorton Tim Morton added a comment -

            OK, this has now been completed, I believe. I had to start/stop a few times, so there may still be stragglers, but I tried to catch them all. All the tables are available under /scratch/tmorton/hscRerun/DM-12043. There are 30284 total tables, comprising 6.3T. The list of commands used to create them all is at /home/tmorton/tickets/DM-12043/make_qa_plots.sh, which was generated by /home/tmorton/tickets/DM-12043/write_script.py. John Swinbank, I'll put you as reviewer assuming that this doesn't need too much of a detailed check, but if you think it needs a more substantial review (e.g., to see if I missed any), perhaps Lauren MacArthur or Hsin-Fang Chiang could look into it.

            Show
            tmorton Tim Morton added a comment - OK, this has now been completed, I believe. I had to start/stop a few times, so there may still be stragglers, but I tried to catch them all. All the tables are available under /scratch/tmorton/hscRerun/ DM-12043 . There are 30284 total tables, comprising 6.3T. The list of commands used to create them all is at /home/tmorton/tickets/ DM-12043 /make_qa_plots.sh , which was generated by /home/tmorton/tickets/ DM-12043 /write_script.py . John Swinbank , I'll put you as reviewer assuming that this doesn't need too much of a detailed check, but if you think it needs a more substantial review (e.g., to see if I missed any), perhaps Lauren MacArthur or Hsin-Fang Chiang could look into it.
            Hide
            swinbank John Swinbank added a comment -

            Sorry for the delay. I think we can regard this as done.

            Show
            swinbank John Swinbank added a comment - Sorry for the delay. I think we can regard this as done.

              People

              • Assignee:
                tmorton Tim Morton
                Reporter:
                tmorton Tim Morton
                Reviewers:
                John Swinbank
                Watchers:
                Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Lauren MacArthur, Paul Price, Tim Morton
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel