# Run pipe_analysis scripts on PDR1 data

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
None
• Story Points:
6
• Team:
Data Release Production

#### Description

In order to explore using the interactive QA plots at an interesting scale, run the pipe_analysis scripts on the full PDR1 dataset at lsst-dev. Most importantly these should create the parquet tables (DM-12030), but they can also make the static plots.

#### Activity

Hide
Tim Morton added a comment -

Thanks John Parejko-- I've tried the subregion thing and gotten that working, but it doesn't seem to be any faster for some reason.

Show
Tim Morton added a comment - Thanks John Parejko -- I've tried the subregion thing and gotten that working, but it doesn't seem to be any faster for some reason.
Hide
John Parejko added a comment -

See my attachment on DM-11785 for my approach. I use Hsin-Fang Chiang's sqlite3 files to extract the visits for each tract+filter that I'm interested in, and then use that to spawn slurm jobs.

https://jira.lsstcorp.org/secure/attachment/30082/jointcal-process.py

The relevant code for you would be something like this (though you do have to know which of UDEEP, DEEP, WIDE, or Aegis your chosen tract is in, but there might be a way to merge those):

 import sqlite3 import os sqlitedir = '/scratch/hchiang2/parejko/' conn = sqlite3.connect(os.path.join(sqlitedir, 'dbDEEP.sqlite3')) cursor = conn.cursor() cmd = "select distinct visit from calexp where tract=:tract and filter=:filt" tract=9813 filt='HSC-Z' cursor.execute(cmd, dict(tract=tract, filt=filt)) result = cursor.fetchall() # NOTE: have to flatten the list, as it comes out as [(1,), (2,), (3,), ...] print('^'.join(str(x[0]) for x in result)) 

Show
John Parejko added a comment - See my attachment on DM-11785 for my approach. I use Hsin-Fang Chiang 's sqlite3 files to extract the visits for each tract+filter that I'm interested in, and then use that to spawn slurm jobs. https://jira.lsstcorp.org/secure/attachment/30082/jointcal-process.py The relevant code for you would be something like this (though you do have to know which of UDEEP, DEEP, WIDE, or Aegis your chosen tract is in, but there might be a way to merge those): import sqlite3 import os sqlitedir = '/scratch/hchiang2/parejko/' conn = sqlite3.connect(os.path.join(sqlitedir, 'dbDEEP.sqlite3')) cursor = conn.cursor() cmd = "select distinct visit from calexp where tract=:tract and filter=:filt" tract=9813 filt='HSC-Z' cursor.execute(cmd, dict(tract=tract, filt=filt)) result = cursor.fetchall() # NOTE: have to flatten the list, as it comes out as [(1,), (2,), (3,), ...] print('^'.join(str(x[0]) for x in result))
Hide
Tim Morton added a comment -

Running all the visit-level scripts is currently blocked by the fact that the slurm cluster schedules an entire node per job, meaning launching a job array to run all 747 jobs would badly clutter up the cluster.

Show
Tim Morton added a comment - Running all the visit-level scripts is currently blocked by the fact that the slurm cluster schedules an entire node per job, meaning launching a job array to run all 747 jobs would badly clutter up the cluster.
Hide
Tim Morton added a comment -

OK, this has now been completed, I believe. I had to start/stop a few times, so there may still be stragglers, but I tried to catch them all. All the tables are available under /scratch/tmorton/hscRerun/DM-12043. There are 30284 total tables, comprising 6.3T. The list of commands used to create them all is at /home/tmorton/tickets/DM-12043/make_qa_plots.sh, which was generated by /home/tmorton/tickets/DM-12043/write_script.py. John Swinbank, I'll put you as reviewer assuming that this doesn't need too much of a detailed check, but if you think it needs a more substantial review (e.g., to see if I missed any), perhaps Lauren MacArthur or Hsin-Fang Chiang could look into it.

Show
Tim Morton added a comment - OK, this has now been completed, I believe. I had to start/stop a few times, so there may still be stragglers, but I tried to catch them all. All the tables are available under /scratch/tmorton/hscRerun/ DM-12043 . There are 30284 total tables, comprising 6.3T. The list of commands used to create them all is at /home/tmorton/tickets/ DM-12043 /make_qa_plots.sh , which was generated by /home/tmorton/tickets/ DM-12043 /write_script.py . John Swinbank , I'll put you as reviewer assuming that this doesn't need too much of a detailed check, but if you think it needs a more substantial review (e.g., to see if I missed any), perhaps Lauren MacArthur or Hsin-Fang Chiang could look into it.
Hide
John Swinbank added a comment -

Sorry for the delay. I think we can regard this as done.

Show
John Swinbank added a comment - Sorry for the delay. I think we can regard this as done.

#### People

Assignee:
Tim Morton
Reporter:
Tim Morton
Reviewers:
John Swinbank
Watchers:
Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Lauren MacArthur, Paul Price, Tim Morton