Assemble a complete database with S13 DRP catalogs

XMLWordPrintable

Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
16
• Sprint:
DB_F16_7, DB_F16_8, DB_F16_9
• Team:
Data Access and Database

Description

Create a database populated with complete catalogs resulting from processing of the SDSS Stripe 82 data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server lsst-db.ncsa.illinois.edu

• 2016-08-10: CORRECTION: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume lsst-dm-gapon attached to image lsst-gapon-0 (floating IP address 141.142.211.152). See details on this setup in the comments stream below.)

The database name will be: gapon_SDRP_Stripe82

The database will be populated with the contents of the following databases:

• NCSA (lsst-db.ncsa.illinois.edu):

 daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 

• IN2P3 (ccdb02.in2p3.fr):

 lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z 

• The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
• the referential integrity of the resulted database will need to be tested

Activity

Igor Gaponenko created issue -
Field Original Value New Value
Epic Link DM-6230 [ 24746 ]
 Link This issue has to be done after DM-7007 [ DM-7007 ]
 Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server: * lsst-db.ncsa.illinois.edu The database name will be: * gapon_SDRP_Stripe82 The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): | daues_SDRP_Stripe82_ncsa | | daues_SDRP_dedupe_byfilter_0 | | daues_SDRP_dedupe_byfilter_1 | | daues_SDRP_dedupe_byfilter_2 | | daues_SDRP_dedupe_byfilter_3 | | daues_SDRP_dedupe_byfilter_4 | *IN2P3* (ccdb02.in2p3.fr): | lsst_prod_DC_2013_2 | | lsst_prod_dedupe_byfilter_g | | lsst_prod_dedupe_byfilter_i | | lsst_prod_dedupe_byfilter_r | | lsst_prod_dedupe_byfilter_u | | lsst_prod_dedupe_byfilter_z | Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested
 Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested
 Story Points 8 16
 Status To Do [ 10001 ] In Progress [ 3 ]
 Sprint DB_F16_7 [ 232 ] DB_F16_7, DB_F16_8 [ 232, 233 ]
 Rank Ranked higher
 Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* - ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested
 Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server: * *lsst-db.ncsa.illinois.edu* - ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*- * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested
 Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*- * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*- * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*). See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested
 Comment [ h1. Initializing database gapon_SDRP_Stripe82 (NCSA) The database will be initialized as a clone of an existing database *daues_SDRP_Stripe82_ncsa.sql*. The procedure has the following steps: # making a *mysqldump* of the source database # creating the destination database # loading the dump into the destination database Initially the dump files were placed at (NCSA): {code} /nfs/lsst8/gapon {code} After switching the database server from *lsst-db* to a Nebula image/volume the dumps were made (uploaded) onto the Nebula volume first. A reason of that was that neither LSST NFS mount poibnt was available to the Nebula images nor the Nebula storage volumes could be mounted on the public development machines allocated to LSST at NCSA. h2. Loading options * in theory a more efficient way of cloning a database would be by chaining two operations using UNIX pipe w/o compressing/decompressing the stream: {code:bash} mysqldump ... daues_SDRP_Stripe82_ncsa | mysql ... gapon_SDRP_Stripe82 {code} The main reason why I chose to go via an intermediate file is because it allows a quicker turnaround time for making other clones (should this be needed) based on the dump file. The second reason was related to the security restrictions imposed by MySQL for views. For instance, a straightforward import of the above taken snapshot failed with the following message: {code} ERROR 1227 (42000) at line 140273: Access denied; you need (at least one of) the SUPER privilege(s) for this operation {code} Further investigation has pointed out onto the following line in the dump: {code} /*!50013 DEFINER=daues@% SQL SECURITY DEFINER */ {code} Hence, a solution was to: {code:bash} % cat daues_SDRP_Stripe82_ncsa.sql | sed 's/daues/gapon/g' | mysql gapon_SDRP_Stripe82 {code} At this point the database was properly seeded. h3. Observations The performance of the database loading was rather low: 20 MB/s or less. And there was a substantial CPU usage on the database server: {code} % top   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND  5236 mysql 20 0 972804 216772 4044 S 83.0 1.3 689:45.22 mysqld  8307 gapon 20 0 238796 6632 2768 S 16.3 0.0 95:42.25 mysql {code} ]
 Comment [ h1. Loading deep forced sources into gapon_SDRP_Stripe82 (NCSA) Loading data of the *u* band (filter index *0*): {code:sql} USE gapon_SDRP_Stripe82; CREATE TABLE RunDeepForcedSource LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; INSERT INTO RunDeepForcedSource SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; {code} The query finished in 32 hours: {code} Query OK, 1729599791 rows affected (1 day 8 hours 15 min 12.97 sec) Records: 1729599791 Duplicates: 0 Warnings: 0 {code} *Observations*: the aggregate performance was (700 GB of data from the table + 190 GB of index) / (1 day 8 hours 15 min) = *7.7 MB/s*. *UPDATED: 2016-08-08*: ran out of space on the database server. Giving up with this option. Investigating other options for setting up a database server with the large disk. ]
 Sprint DB_F16_7, DB_F16_8 [ 232, 233 ] DB_F16_7, DB_F16_8, DB_F16_9 [ 232, 233, 249 ]
 Rank Ranked higher
 Labels pdac
 Resolution Done [ 10000 ] Status In Progress [ 3 ] Done [ 10002 ]
Yi Mei [X] (Inactive) made changes -
 Link This issue relates to DM-7762 [ DM-7762 ]
 Link This issue is triggering DM-8241 [ DM-8241 ]

People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Watchers:
Fritz Mueller, Gregory Dubois-Felsmann, Igor Gaponenko