Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Qserv
-
Labels:
-
Story Points:16
-
Epic Link:
-
Sprint:DB_F16_7, DB_F16_8, DB_F16_9
-
Team:Data Access and Database
Description
Create a database populated with complete catalogs resulting from processing of the SDSS Stripe 82 data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server lsst-db.ncsa.illinois.edu
- 2016-08-10: CORRECTION: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume lsst-dm-gapon attached to image lsst-gapon-0 (floating IP address 141.142.211.152). See details on this setup in the comments stream below.)
The database name will be: gapon_SDRP_Stripe82
The database will be populated with the contents of the following databases:
- NCSA (lsst-db.ncsa.illinois.edu):
daues_SDRP_Stripe82_ncsa
daues_SDRP_dedupe_byfilter_0
daues_SDRP_dedupe_byfilter_1
daues_SDRP_dedupe_byfilter_2
daues_SDRP_dedupe_byfilter_3
daues_SDRP_dedupe_byfilter_4
- IN2P3 (ccdb02.in2p3.fr):
lsst_prod_DC_2013_2
lsst_prod_dedupe_byfilter_g
lsst_prod_dedupe_byfilter_i
lsst_prod_dedupe_byfilter_r
lsst_prod_dedupe_byfilter_u
lsst_prod_dedupe_byfilter_z
Additional requirements:
- The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
- the referential integrity of the resulted database will need to be tested
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Epic Link |
|
Description |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
* lsst-db.ncsa.illinois.edu The database name will be: * gapon_SDRP_Stripe82 The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): | daues_SDRP_Stripe82_ncsa | | daues_SDRP_dedupe_byfilter_0 | | daues_SDRP_dedupe_byfilter_1 | | daues_SDRP_dedupe_byfilter_2 | | daues_SDRP_dedupe_byfilter_3 | | daues_SDRP_dedupe_byfilter_4 | *IN2P3* (ccdb02.in2p3.fr): | lsst_prod_DC_2013_2 | | lsst_prod_dedupe_byfilter_g | | lsst_prod_dedupe_byfilter_i | | lsst_prod_dedupe_byfilter_r | | lsst_prod_dedupe_byfilter_u | | lsst_prod_dedupe_byfilter_z | Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Description |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Story Points | 8 | 16 |
Status | To Do [ 10001 ] | In Progress [ 3 ] |
Sprint | DB_F16_7 [ 232 ] | DB_F16_7, DB_F16_8 [ 232, 233 ] |
Rank | Ranked higher |
Transferring MySQL dump files from IN2P3 to NCSA using BBCP
Due to an on-going modernization of the NCSA network which may cause long transfers to fail the transfer is happening in two stages:
- copy files to SLAC: psexport.slac.stanford.edu:/reg/data/ana04/test/gapon/in2p3/
- copy from SLAC to NCSA: /lsst8/gapon/
IN2P3 to SLAC
All but transfers were initiated from hosts ccqservbuild.in2p3.fr or ccage.in2p3.fr. The typical configuration of the data transfer command was:
% bbcp -v -P 2 -s 16 -r \ |
lsst_prod_DC_2013_2/ \
|
psexport.slac.stanford.edu:/reg/data/ana04/test/gapon/in2p3/ |
Progress report
database | status |
---|---|
lsst_prod_DC_2013_2 | ![]() |
lsst_prod_dedupe_byfilter_z | ![]() |
lsst_prod_dedupe_byfilter_u | ![]() |
lsst_prod_dedupe_byfilter_r | ![]() |
lsst_prod_dedupe_byfilter_i | ![]() |
lsst_prod_dedupe_byfilter_g | ![]() |
Observations
The sustained transfer rates:
host | rate | notes |
---|---|---|
ccqservbuild.in2p3.fr | 7.5 MB/s | limited by the network configuration of that VM |
ccage.in2p3.fr | 65 MB/s | limited by WAN |
An option of a parallel transfer(s) into a separate destination(s), such as NCSA has also been investigated. It has led to the same result as above. Apparently there is bottleneck at WAN segments connecting IN2P3 with US networks.
SLAC to NCSA
The files are being moved to NCSA via a dedicated DTN node lsst-xfer.ncsa.illinois.edu. The machine has a range of ports 65001-65535 open for incoming connections. Details found at:
The copying was made using the following command:
% bbcp -i $HOME/.ssh/lsst/id_ncsa_rsa -v -P 2 -s 16 -Z 65001 -r \ |
/reg/data/ana04/test/gapon/in2p3/lsst_prod_DC_2013_2 \ |
lsst-xfer.ncsa.illinois.edu:/lsst8/gapon/ |
Progress report
database | status |
---|---|
lsst_prod_DC_2013_2 | ![]() |
lsst_prod_dedupe_byfilter_z | ![]() |
lsst_prod_dedupe_byfilter_u | ![]() |
lsst_prod_dedupe_byfilter_r | ![]() |
lsst_prod_dedupe_byfilter_i | ![]() |
lsst_prod_dedupe_byfilter_g | ![]() |
Observations
The average transfer rate was about 300 MB/s
Uploading dumps into a Nebula volume
This operation was run (AS NEEDED FOR FURTHER LOADING!) from node lsst-dev.ncsa.illinois.edu:
% bbcp -v -P 2 -Z 65001 -z -s 16 -i ~/.ssh/id_ncsa_rsa -r \ |
/lsst8/gapon/daues_SDRP_Stripe82_ncsa.sql \ |
141.142.211.152:/data/gapon/ |
|
% bbcp -v -P 2 -Z 65001 -z -s 16 -i ~/.ssh/id_ncsa_rsa -r \ |
/lsst8/gapon/lsst_prod_dedupe_byfilter_z \ |
141.142.211.152:/data/gapon/ |
|
...
|
Observations
The typical performance of the operation was 95 MB/s.
Description |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* - ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Description |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server:
* *lsst-db.ncsa.illinois.edu* - ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: * *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
* 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Description |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
* 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
* 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*). See details on this setup in the comments stream below.) The database name will be: *gapon_SDRP_Stripe82* The database will be populated with the contents of the following databases: * *NCSA* (lsst-db.ncsa.illinois.edu): {code} daues_SDRP_Stripe82_ncsa daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 {code} * *IN2P3* (ccdb02.in2p3.fr): {code} lsst_prod_DC_2013_2 lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_z {code} Additional requirements: * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated. * the referential integrity of the resulted database will need to be tested |
The first attempt to initializing database gapon_SDRP_Stripe82 (NCSA)
The database will be initialized as a clone of an existing database daues_SDRP_Stripe82_ncsa.sql. The procedure has the following steps:
- making a mysqldump of the source database
- creating the destination database
- loading the dump into the destination database
Initially the dump files were placed at (NCSA):
/nfs/lsst8/gapon
|
Loading options
- in theory a more efficient way of cloning a database would be by chaining two operations using UNIX pipe w/o compressing/decompressing the stream:
mysqldump ... daues_SDRP_Stripe82_ncsa | mysql ... gapon_SDRP_Stripe82
The main reason why I chose to go via an intermediate file is because it allows a quicker turnaround time for making other clones (should this be needed) based on the dump file. The second reason was related to the security restrictions imposed by MySQL for views. For instance, a straightforward import of the above taken snapshot failed with the following message:
ERROR 1227 (42000) at line 140273: Access denied; you need (at least one of) the SUPER privilege(s) for this operation
Further investigation has pointed out onto the following line in the dump:
/*!50013 DEFINER=`daues`@`%` SQL SECURITY DEFINER */
Hence, a solution was to:
% cat daues_SDRP_Stripe82_ncsa.sql | sed 's/daues/gapon/g' | mysql gapon_SDRP_Stripe82
At this point the database was properly seeded.
Observations
The performance of the database loading was rather low: 20 MB/s or less. And there was a substantial CPU usage on the database server:
% top
|
|
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
5236 mysql 20 0 972804 216772 4044 S 83.0 1.3 689:45.22 mysqld
|
8307 gapon 20 0 238796 6632 2768 S 16.3 0.0 95:42.25 mysql
|
Loading deep forced sources into gapon_SDRP_Stripe82 (NCSA)
Loading data of the u band (filter index 0):
USE gapon_SDRP_Stripe82; |
CREATE TABLE RunDeepForcedSource LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; |
INSERT INTO RunDeepForcedSource SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; |
The query finished in 32 hours:
Query OK, 1729599791 rows affected (1 day 8 hours 15 min 12.97 sec)
|
Records: 1729599791 Duplicates: 0 Warnings: 0
|
Observations: the aggregate performance was (700 GB of data from the table + 190 GB of index) / (1 day 8 hours 15 min) = 7.7 MB/s.
UPDATED: 2016-08-08: ran out of space on the database server. Giving up with this option. Investigating other options for setting up a database server with the large disk.
Comment |
[ h1. Initializing database gapon_SDRP_Stripe82 (NCSA)
The database will be initialized as a clone of an existing database *daues_SDRP_Stripe82_ncsa.sql*. The procedure has the following steps: # making a *mysqldump* of the source database # creating the destination database # loading the dump into the destination database Initially the dump files were placed at (NCSA): {code} /nfs/lsst8/gapon {code} After switching the database server from *lsst-db* to a Nebula image/volume the dumps were made (uploaded) onto the Nebula volume first. A reason of that was that neither LSST NFS mount poibnt was available to the Nebula images nor the Nebula storage volumes could be mounted on the public development machines allocated to LSST at NCSA. h2. Loading options * in theory a more efficient way of cloning a database would be by chaining two operations using UNIX pipe w/o compressing/decompressing the stream: {code:bash} mysqldump ... daues_SDRP_Stripe82_ncsa | mysql ... gapon_SDRP_Stripe82 {code} The main reason why I chose to go via an intermediate file is because it allows a quicker turnaround time for making other clones (should this be needed) based on the dump file. The second reason was related to the security restrictions imposed by MySQL for views. For instance, a straightforward import of the above taken snapshot failed with the following message: {code} ERROR 1227 (42000) at line 140273: Access denied; you need (at least one of) the SUPER privilege(s) for this operation {code} Further investigation has pointed out onto the following line in the dump: {code} /*!50013 DEFINER=`daues`@`%` SQL SECURITY DEFINER */ {code} Hence, a solution was to: {code:bash} % cat daues_SDRP_Stripe82_ncsa.sql | sed 's/daues/gapon/g' | mysql gapon_SDRP_Stripe82 {code} At this point the database was properly seeded. h3. Observations The performance of the database loading was rather low: 20 MB/s or less. And there was a substantial CPU usage on the database server: {code} % top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5236 mysql 20 0 972804 216772 4044 S 83.0 1.3 689:45.22 mysqld 8307 gapon 20 0 238796 6632 2768 S 16.3 0.0 95:42.25 mysql {code} ] |
Comment |
[ h1. Loading deep forced sources into gapon_SDRP_Stripe82 (NCSA)
Loading data of the *u* band (filter index *0*): {code:sql} USE gapon_SDRP_Stripe82; CREATE TABLE RunDeepForcedSource LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; INSERT INTO RunDeepForcedSource SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; {code} The query finished in 32 hours: {code} Query OK, 1729599791 rows affected (1 day 8 hours 15 min 12.97 sec) Records: 1729599791 Duplicates: 0 Warnings: 0 {code} *Observations*: the aggregate performance was (700 GB of data from the table + 190 GB of index) / (1 day 8 hours 15 min) = *7.7 MB/s*. *UPDATED: 2016-08-08*: ran out of space on the database server. Giving up with this option. Investigating other options for setting up a database server with the large disk. ] |
Loading MySQL dumps taken in IN2P3 and NCSA into the combined database (LSST Nebula)
This is the very delicate operation because it will also require a thorough evaluation of the input data. This is the proposed scenario:
- initialize the database with gapon_SDRP_Stripe82
- load the first bunch of the deep forced sources for frequency band 0 from database daues_SDRP_dedupe_byfilter_0 into gapon_SDRP_Stripe82
- initialize database gapon_SDRP_dedupe_byfilter_1 and load the second bunch from database daues_SDRP_dedupe_byfilter_1 into it
- cross check the deep forced sources of that band against the first band to see if there are any conflicts (key overlaps, etc.) between them
TO BE COMPLETED...
Evaluating schema of loaded catalogs
Schema
Explained in: https://github.com/lsst/cat/tree/master/sql
Each database has a special table with a reference to a schema initialization procedure:
SELECT * FROM ZZZ_Db_Description;
|
|
+------------------------------+------+
|
| f | r |
|
+------------------------------+------+
|
| lsstSchema4mysqlS12_sdss.sql | NULL |
|
+------------------------------+------+
|
Tables with non-empty contents in the main database (using the one from NCSA as an example):
SELECT
|
TABLE_NAME, TABLE_ROWS, DATA_LENGTH
|
FROM
|
information_schema.Tables
|
WHERE
|
TABLE_SCHEMA='gapon_SDRP_Stripe82'
|
AND
|
TABLE_ROWS > 0;
|
+-------------------------------+------------+--------------+
|
| TABLE_NAME | TABLE_ROWS | DATA_LENGTH |
|
+-------------------------------+------------+--------------+
|
| DeepCoadd | 13735 | 4862980 |
|
| DeepCoadd_Metadata | 2809690 | 122591652 |
|
| DeepCoadd_To_Htm10 | 326420 | 4243460 |
|
| Filter | 6 | 78 |
|
| LeapSeconds | 39 | 1443 |
|
| RunDeepForcedSource | 1729599791 | 683191917445 |
|
| RunDeepSource | 98577782 | 55006402356 |
|
| Science_Ccd_Exposure | 1403385 | 644435260 |
|
| Science_Ccd_Exposure_Metadata | 164196045 | 5085842272 |
|
| Science_Ccd_Exposure_To_Htm10 | 26861430 | 349198590 |
|
| ZZZ_Db_Description | 1 | 36 |
|
+-------------------------------+------------+--------------+
|
NOTE: this particular database has already been preloaded with table RunDeepForcedSource which has deep forced sources of filter 0 (band u) processed at NCSA.
Filter-specific databases (using one of those as an example):
SELECT
|
TABLE_NAME,TABLE_ROWS,DATA_LENGTH
|
FROM
|
information_schema.Tables
|
WHERE
|
TABLE_SCHEMA='daues_SDRP_dedupe_byfilter_1'
|
AND
|
TABLE_ROWS > 0;
|
+---------------------+------------+--------------+
|
| TABLE_NAME | TABLE_ROWS | DATA_LENGTH |
|
+---------------------+------------+--------------+
|
| RunDeepForcedSource | 1752805399 | 692358132605 |
|
+---------------------+------------+--------------+
|
Dependencies
table: Filter
column | |
---|---|
PK | filterId |
table: Science_Ccd_Exposure
column | FK.table | FK.column | |
---|---|---|---|
PK | scienceCcdExposureId | ||
FK | filterId | Filter | filterId |
table: Science_Ccd_Exposure_Metadata
table: Science_Ccd_Exposure_To_Htm10
column | FK.table | FK.column | |
---|---|---|---|
FK | exposure_id | Science_Ccd_Exposure | scienceCcdExposureId |
table: DeepCoadd
column | |
---|---|
PK | deepCoaddId |
table: DeepCoadd_Metadata
table: DeepCoadd_To_Htm10
column | FK.table | FK.column | |
---|---|---|---|
FK | deepCoaddId | DeepCoadd | deepCoaddId |
table: RunDeepSource (L2 Object as per LSST DPD)
column | FK.table | FK.column | |
---|---|---|---|
PK | id | ||
FK | parent | self | id |
FK | coadd_id | DeepCoadd | deepCoaddId |
FK | coadd_filter_id | Filter | filterId |
table: RunDeepForcedSource (L2 Source as per LSST DPD)
column | FK.table | FK.column | |
---|---|---|---|
PK | id | ||
FK | object_id | RunDeepSource | id |
FK | exposure_filter_id | Filter | filterId |
FK | exposure_id | Science_Ccd_Exposure | scienceCcdExposureI |
Observations on direct dependencies
Direct FK/PK dependencies as per the schema:
- Filter
- Science_Ccd_Exposure
- RunDeepSource
- RunDeepForcedSource
- Science_Ccd_Exposure
- Science_Ccd_Exposure_Metadata
- Science_Ccd_Exposure_To_Htm10
- RunDeepForcedSource
- DeepCoadd
- DeepCoadd_Metadata
- DeepCoadd_To_Htm10
- RunDeepSource
- RunDeepSource
- RunDeepForcedSource
Preliminary conclusions:
- Dependencies onto Filter are trivial due to a small size and a static nature of that table
- Dependencies on Science_Ccd_Exposure allows for potentially easy merge (of two sites) of the relevant 2 metadata tables (Science_Ccd_Exposure_Metadata and Science_Ccd_Exposure_To_Htm10) based based on the run numbers encoded into the primary key of that table
- DeepCoadd, RunDeepSource and RunDeepForcedSource form a cluster of dependencies which requires further investigation before attempting any merge
- RunDeepForcedSource has dual dependency onto Science_Ccd_Exposure and RunDeepSource, of which one must have be trivially eliminated.
- 2016-09-01: it turns out the key dependency is onto RunDeepSource. See details in one of the comments down the stream.
Further actions:
- investigate a structure of the primary key deepCoaddId of table DeepCoadd
- understand a meaning of patch and tract in table DeepCoadd. This may be a clue for establishing a correct merge algorithm
- investigate the dependency chain of DeepCoadd, RunDeepSource and RunDeepForcedSource
Spatial (indirect) dependencies
There are three options here:
- RA and DEC
- patch, tract (for DeepCoadd)
- HTM
The algorithm for generating calibrated CCD exposures IDs
The algorithm can be deduced from the following results sets (for each processing site)
NCSA:
USE gapon_SDRP_Stripe82;
|
|
SELECT
|
scienceCcdExposureId, run, filterId, camcol, field
|
FROM
|
Science_Ccd_Exposure
|
WHERE
|
run=1033
|
ORDER BY
|
scienceCcdExposureId
|
ASC LIMIT 3;
|
|
+----------------------+------+----------+--------+-------+
|
| scienceCcdExposureId | run | filterId | camcol | field |
|
+----------------------+------+----------+--------+-------+
|
| 1033010018 | 1033 | 0 | 1 | 18 |
|
| 1033010019 | 1033 | 0 | 1 | 19 |
|
| 1033010020 | 1033 | 0 | 1 | 20 |
|
+----------------------+------+----------+--------+-------+
|
IN2P3:
USE lsst_prod_DC_2013_2;
|
|
SELECT
|
scienceCcdExposureId, run, filterId, camcol, field
|
FROM
|
Science_Ccd_Exposure
|
WHERE
|
run=1752
|
ORDER BY
|
scienceCcdExposureId
|
ASC LIMIT 3;
|
|
+----------------------+------+----------+--------+-------+
|
| scienceCcdExposureId | run | filterId | camcol | field |
|
+----------------------+------+----------+--------+-------+
|
| 1752010040 | 1752 | 0 | 1 | 40 |
|
| 1752010041 | 1752 | 0 | 1 | 41 |
|
| 1752010042 | 1752 | 0 | 1 | 42 |
|
+----------------------+------+----------+--------+-------+
|
The final formula:
scienceCcdExposureId = 1000000 * run + 100000 * filterId + 10000 * camcol + field
|
NOTE: Apparently, there is the following code for calculating these identifiers:
The relevant function is found below:
def _computeCcdExposureId(self, dataId):
|
"""Compute the 64-bit (long) identifier for a CCD exposure.
|
@param dataId (dict) Data identifier with run, rerun, filter, camcol, field
|
"""
|
return ((long(dataId['run']) \
|
* 10 + self.filterIdMap[dataId['filter']]) \
|
* 10 + dataId['camcol']) \
|
* 10000 + dataId['field']
|
This code is PERFECTLY CONSISTENT with the previously shown observations on what's actually stored in the databases.
The algorithm for generating primary keys of table DeepCoadd
The algorithm is explained in:
The simplified formula based on a presence of filters looks like this:
tract = 3
|
patchX = 99
|
patchY = 9
|
filterId = 0
|
|
id = 8 * ((tract * 2**13 + patchX) * 2**13 + patchY) + filterId
|
1617100872
|
It produces results which are consistent with the following results set sets:
USE gapon_SDRP_Stripe82;
|
SELECT
|
deepCoaddId,tract,patch,filterId
|
FROM
|
DeepCoadd
|
WHERE
|
patch='99,9'
|
ORDER BY
|
deepCoaddId;
|
|
+-------------+-------+-------+----------+
|
| deepCoaddId | tract | patch | filterId |
|
+-------------+-------+-------+----------+
|
| 1617100872 | 3 | 99,9 | 0 |
|
| 1617100873 | 3 | 99,9 | 1 |
|
| 1617100874 | 3 | 99,9 | 2 |
|
| 1617100875 | 3 | 99,9 | 3 |
|
| 1617100876 | 3 | 99,9 | 4 |
|
+-------------+-------+-------+----------+
|
Analyzing an overlap in DeepCoadd and RunDeepSource
As per my earlier conversation with KT Lim:
- the overlap can be uniquely identified by following patch and tract attributes in table DeepCoadd
- single exposure images are not aligned with boundaries of patches and tracts
- RA/DEC isn't a reliable mechanism for tracking overlaps in this set of tables
The overlap can be tracked using the following query:
SELECT
|
patch,tract,COUNT(*) |
FROM
|
gapon_SDRP_Stripe82.DeepCoadd
|
WHERE
|
(patch,tract) IN (SELECT patch,tract FROM lsst_prod_DC_2013_2.DeepCoadd) |
GROUP BY |
patch,tract
|
ORDER BY patch,tract; |
+--------+-------+----------+
|
| patch | tract | COUNT(*) |
|
+--------+-------+----------+
|
| 364,0 | 0 | 5 |
|
| 364,1 | 0 | 5 |
|
| 364,10 | 0 | 5 |
|
| 364,11 | 0 | 5 |
|
| 364,2 | 0 | 5 |
|
| 364,3 | 0 | 5 |
|
| 364,4 | 0 | 5 |
|
| 364,5 | 0 | 5 |
|
| 364,6 | 0 | 5 |
|
| 364,7 | 0 | 5 |
|
| 364,8 | 0 | 5 |
|
| 364,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 365,0 | 0 | 5 |
|
| 365,1 | 0 | 5 |
|
| 365,10 | 0 | 5 |
|
| 365,11 | 0 | 5 |
|
| 365,2 | 0 | 5 |
|
| 365,3 | 0 | 5 |
|
| 365,4 | 0 | 5 |
|
| 365,5 | 0 | 5 |
|
| 365,6 | 0 | 5 |
|
| 365,7 | 0 | 5 |
|
| 365,8 | 0 | 5 |
|
| 365,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 366,0 | 0 | 5 |
|
| 366,1 | 0 | 5 |
|
| 366,10 | 0 | 5 |
|
| 366,11 | 0 | 5 |
|
| 366,2 | 0 | 5 |
|
| 366,3 | 0 | 5 |
|
| 366,4 | 0 | 5 |
|
| 366,5 | 0 | 5 |
|
| 366,6 | 0 | 5 |
|
| 366,7 | 0 | 5 |
|
| 366,8 | 0 | 5 |
|
| 366,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 367,0 | 0 | 5 |
|
| 367,1 | 0 | 5 |
|
| 367,10 | 0 | 5 |
|
| 367,11 | 0 | 5 |
|
| 367,2 | 0 | 5 |
|
| 367,3 | 0 | 5 |
|
| 367,4 | 0 | 5 |
|
| 367,5 | 0 | 5 |
|
| 367,6 | 0 | 5 |
|
| 367,7 | 0 | 5 |
|
| 367,8 | 0 | 5 |
|
| 367,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 368,0 | 0 | 5 |
|
| 368,1 | 0 | 5 |
|
| 368,10 | 0 | 5 |
|
| 368,11 | 0 | 5 |
|
| 368,2 | 0 | 5 |
|
| 368,3 | 0 | 5 |
|
| 368,4 | 0 | 5 |
|
| 368,5 | 0 | 5 |
|
| 368,6 | 0 | 5 |
|
| 368,7 | 0 | 5 |
|
| 368,8 | 0 | 5 |
|
| 368,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 369,0 | 0 | 5 |
|
| 369,1 | 0 | 5 |
|
| 369,10 | 0 | 5 |
|
| 369,11 | 0 | 5 |
|
| 369,2 | 0 | 5 |
|
| 369,3 | 0 | 5 |
|
| 369,4 | 0 | 5 |
|
| 369,5 | 0 | 5 |
|
| 369,6 | 0 | 5 |
|
| 369,7 | 0 | 5 |
|
| 369,8 | 0 | 5 |
|
| 369,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 370,0 | 0 | 5 |
|
| 370,1 | 0 | 5 |
|
| 370,10 | 0 | 5 |
|
| 370,11 | 0 | 5 |
|
| 370,2 | 0 | 5 |
|
| 370,3 | 0 | 5 |
|
| 370,4 | 0 | 5 |
|
| 370,5 | 0 | 5 |
|
| 370,6 | 0 | 5 |
|
| 370,7 | 0 | 5 |
|
| 370,8 | 0 | 5 |
|
| 370,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 371,0 | 0 | 5 |
|
| 371,1 | 0 | 5 |
|
| 371,10 | 0 | 5 |
|
| 371,11 | 0 | 5 |
|
| 371,2 | 0 | 5 |
|
| 371,3 | 0 | 5 |
|
| 371,4 | 0 | 5 |
|
| 371,5 | 0 | 5 |
|
| 371,6 | 0 | 5 |
|
| 371,7 | 0 | 5 |
|
| 371,8 | 0 | 5 |
|
| 371,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 372,0 | 0 | 5 |
|
| 372,1 | 0 | 5 |
|
| 372,10 | 0 | 5 |
|
| 372,11 | 0 | 5 |
|
| 372,2 | 0 | 5 |
|
| 372,3 | 0 | 5 |
|
| 372,4 | 0 | 5 |
|
| 372,5 | 0 | 5 |
|
| 372,6 | 0 | 5 |
|
| 372,7 | 0 | 5 |
|
| 372,8 | 0 | 5 |
|
| 372,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 373,0 | 0 | 5 |
|
| 373,1 | 0 | 5 |
|
| 373,10 | 0 | 5 |
|
| 373,11 | 0 | 5 |
|
| 373,2 | 0 | 5 |
|
| 373,3 | 0 | 5 |
|
| 373,4 | 0 | 5 |
|
| 373,5 | 0 | 5 |
|
| 373,6 | 0 | 5 |
|
| 373,7 | 0 | 5 |
|
| 373,8 | 0 | 5 |
|
| 373,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 374,0 | 0 | 5 |
|
| 374,1 | 0 | 5 |
|
| 374,10 | 0 | 5 |
|
| 374,11 | 0 | 5 |
|
| 374,2 | 0 | 5 |
|
| 374,3 | 0 | 5 |
|
| 374,4 | 0 | 5 |
|
| 374,5 | 0 | 5 |
|
| 374,6 | 0 | 5 |
|
| 374,7 | 0 | 5 |
|
| 374,8 | 0 | 5 |
|
| 374,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 375,0 | 0 | 5 |
|
| 375,1 | 0 | 5 |
|
| 375,10 | 0 | 5 |
|
| 375,11 | 0 | 5 |
|
| 375,2 | 0 | 5 |
|
| 375,3 | 0 | 5 |
|
| 375,4 | 0 | 5 |
|
| 375,5 | 0 | 5 |
|
| 375,6 | 0 | 5 |
|
| 375,7 | 0 | 5 |
|
| 375,8 | 0 | 5 |
|
| 375,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 376,0 | 0 | 5 |
|
| 376,1 | 0 | 5 |
|
| 376,10 | 0 | 5 |
|
| 376,11 | 0 | 5 |
|
| 376,2 | 0 | 5 |
|
| 376,3 | 0 | 5 |
|
| 376,4 | 0 | 5 |
|
| 376,5 | 0 | 5 |
|
| 376,6 | 0 | 5 |
|
| 376,7 | 0 | 5 |
|
| 376,8 | 0 | 5 |
|
| 376,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 377,0 | 0 | 5 |
|
| 377,1 | 0 | 5 |
|
| 377,10 | 0 | 5 |
|
| 377,11 | 0 | 5 |
|
| 377,2 | 0 | 5 |
|
| 377,3 | 0 | 5 |
|
| 377,4 | 0 | 5 |
|
| 377,5 | 0 | 5 |
|
| 377,6 | 0 | 5 |
|
| 377,7 | 0 | 5 |
|
| 377,8 | 0 | 5 |
|
| 377,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 378,0 | 0 | 5 |
|
| 378,1 | 0 | 5 |
|
| 378,10 | 0 | 5 |
|
| 378,11 | 0 | 5 |
|
| 378,2 | 0 | 5 |
|
| 378,3 | 0 | 5 |
|
| 378,4 | 0 | 5 |
|
| 378,5 | 0 | 5 |
|
| 378,6 | 0 | 5 |
|
| 378,7 | 0 | 5 |
|
| 378,8 | 0 | 5 |
|
| 378,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 379,0 | 0 | 5 |
|
| 379,1 | 0 | 5 |
|
| 379,10 | 0 | 5 |
|
| 379,11 | 0 | 5 |
|
| 379,2 | 0 | 5 |
|
| 379,3 | 0 | 5 |
|
| 379,4 | 0 | 5 |
|
| 379,5 | 0 | 5 |
|
| 379,6 | 0 | 5 |
|
| 379,7 | 0 | 5 |
|
| 379,8 | 0 | 5 |
|
| 379,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 380,0 | 0 | 5 |
|
| 380,1 | 0 | 5 |
|
| 380,10 | 0 | 5 |
|
| 380,11 | 0 | 5 |
|
| 380,2 | 0 | 5 |
|
| 380,3 | 0 | 5 |
|
| 380,4 | 0 | 5 |
|
| 380,5 | 0 | 5 |
|
| 380,6 | 0 | 5 |
|
| 380,7 | 0 | 5 |
|
| 380,8 | 0 | 5 |
|
| 380,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 381,0 | 0 | 5 |
|
| 381,1 | 0 | 5 |
|
| 381,10 | 0 | 5 |
|
| 381,11 | 0 | 5 |
|
| 381,2 | 0 | 5 |
|
| 381,3 | 0 | 5 |
|
| 381,4 | 0 | 5 |
|
| 381,5 | 0 | 5 |
|
| 381,6 | 0 | 5 |
|
| 381,7 | 0 | 5 |
|
| 381,8 | 0 | 5 |
|
| 381,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 382,0 | 0 | 5 |
|
| 382,1 | 0 | 5 |
|
| 382,10 | 0 | 5 |
|
| 382,11 | 0 | 5 |
|
| 382,2 | 0 | 5 |
|
| 382,3 | 0 | 5 |
|
| 382,4 | 0 | 5 |
|
| 382,5 | 0 | 5 |
|
| 382,6 | 0 | 5 |
|
| 382,7 | 0 | 5 |
|
| 382,8 | 0 | 5 |
|
| 382,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 383,0 | 0 | 5 |
|
| 383,1 | 0 | 5 |
|
| 383,10 | 0 | 5 |
|
| 383,11 | 0 | 5 |
|
| 383,2 | 0 | 5 |
|
| 383,3 | 0 | 5 |
|
| 383,4 | 0 | 5 |
|
| 383,5 | 0 | 5 |
|
| 383,6 | 0 | 5 |
|
| 383,7 | 0 | 5 |
|
| 383,8 | 0 | 5 |
|
| 383,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 384,0 | 0 | 5 |
|
| 384,1 | 0 | 5 |
|
| 384,10 | 0 | 5 |
|
| 384,11 | 0 | 5 |
|
| 384,2 | 0 | 5 |
|
| 384,3 | 0 | 5 |
|
| 384,4 | 0 | 5 |
|
| 384,5 | 0 | 5 |
|
| 384,6 | 0 | 5 |
|
| 384,7 | 0 | 5 |
|
| 384,8 | 0 | 5 |
|
| 384,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 385,0 | 0 | 5 |
|
| 385,1 | 0 | 5 |
|
| 385,10 | 0 | 5 |
|
| 385,11 | 0 | 5 |
|
| 385,2 | 0 | 5 |
|
| 385,3 | 0 | 5 |
|
| 385,4 | 0 | 5 |
|
| 385,5 | 0 | 5 |
|
| 385,6 | 0 | 5 |
|
| 385,7 | 0 | 5 |
|
| 385,8 | 0 | 5 |
|
| 385,9 | 0 | 5 |
|
+--------+-------+----------+
|
| 386,0 | 0 | 5 |
|
| 386,1 | 0 | 5 |
|
| 386,10 | 0 | 5 |
|
| 386,11 | 0 | 5 |
|
| 386,2 | 0 | 5 |
|
| 386,3 | 0 | 5 |
|
| 386,4 | 0 | 5 |
|
| 386,5 | 0 | 5 |
|
| 386,6 | 0 | 5 |
|
| 386,7 | 0 | 5 |
|
| 386,8 | 0 | 5 |
|
| 386,9 | 0 | 5 |
|
+--------+-------+----------+
|
An opposite query yields the same result, which means that coadds have been build by both sites in the overlap area:
SELECT
|
patch,tract,COUNT(*) |
FROM
|
lsst_prod_DC_2013_2.DeepCoadd
|
WHERE
|
(patch,tract) IN (SELECT patch,tract FROM gapon_SDRP_Stripe82.DeepCoadd) |
GROUP BY |
patch,tract
|
ORDER BY patch,tract; |
The overlap has been found in a rectangular region of patches:
- X: 364 - 386
- Y: 0 - 11
All 5 bands are present in each patch.
The structure of the primary key in table RunDeepSource
This table has another FK parent referencing the PK id of the same table. This raises a potential problem of running into a key collision (and, in more general scope, the referential integrity) situation when attempting to merge the table contents.
A preliminary investigation show that:
- a key compute algorithm is based on patch, trac and filter
- there is an exact match of the keys for the overlap region
The first observation is illustrated by:
SELECT
|
COUNT(id),MAX(id)-MIN(id),MIN(id),MAX(id)
|
FROM
|
gapon_SDRP_Stripe82.RunDeepSource
|
WHERE
|
coadd_id IN (SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE patch='364,0' AND tract=0 AND filterId=0);
|
|
+-----------+-----------------+------------------+------------------+
|
| COUNT(id) | MAX(id)-MIN(id) | MIN(id) | MAX(id) |
|
+-----------+-----------------+------------------+------------------+
|
| 5428 | 5427 | 3201777860083756 | 3201777860089183 |
|
+-----------+-----------------+------------------+------------------+
|
|
SELECT
|
COUNT(id),MAX(id)-MIN(id),MIN(id),MAX(id)
|
FROM
|
gapon_SDRP_Stripe82.RunDeepSource
|
WHERE
|
coadd_id IN (SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE patch='364,1' AND tract=0 AND filterId=0);
|
|
+-----------+-----------------+------------------+------------------+
|
| COUNT(id) | MAX(id)-MIN(id) | MIN(id) | MAX(id) |
|
+-----------+-----------------+------------------+------------------+
|
| 5008 | 5007 | 3201778933825600 | 3201778933830607 |
|
+-----------+-----------------+------------------+------------------+
|
|
SELECT 3201778933825600-3201777860083756;
|
|
+-----------------------------------+
|
| 3201778933825600-3201777860083756 |
|
+-----------------------------------+
|
| 1073741844 |
|
+-----------------------------------+
|
|
SELECT
|
COUNT(id),MAX(id)-MIN(id),MIN(id),MAX(id)
|
FROM
|
gapon_SDRP_Stripe82.RunDeepSource
|
WHERE
|
coadd_id IN (SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE patch='364,0' AND tract=0 AND filterId=1);
|
|
+-----------+-----------------+------------------+------------------+
|
| COUNT(id) | MAX(id)-MIN(id) | MIN(id) | MAX(id) |
|
+-----------+-----------------+------------------+------------------+
|
| 6510 | 6509 | 3201777994301623 | 3201777994308132 |
|
+-----------+-----------------+------------------+------------------+
|
|
SELECT 3201777994301623-3201777860083756;
|
|
+-----------------------------------+
|
| 3201777994301623-3201777860083756 |
|
+-----------------------------------+
|
| 134217867 |
|
+-----------------------------------+
|
The second statement is illustrated with results of the following queries:
NCSA:
SELECT
|
COUNT(id),MAX(id)-MIN(id),MIN(id),MAX(id)
|
FROM
|
gapon_SDRP_Stripe82.RunDeepSource
|
WHERE
|
coadd_id IN (SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE patch='364,0' AND tract=0 AND filterId=0);
|
|
+-----------+-----------------+------------------+------------------+
|
| COUNT(id) | MAX(id)-MIN(id) | MIN(id) | MAX(id) |
|
+-----------+-----------------+------------------+------------------+
|
| 5428 | 5427 | 3201777860083756 | 3201777860089183 |
|
+-----------+-----------------+------------------+------------------+
|
IN2P3:
SELECT
|
COUNT(id),MAX(id)-MIN(id),MIN(id),MAX(id)
|
FROM
|
lsst_prod_DC_2013_2.RunDeepSource
|
WHERE
|
coadd_id IN (SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd WHERE patch='364,0' AND tract=0 AND filterId=0);
|
|
+-----------+-----------------+------------------+------------------+
|
| COUNT(id) | MAX(id)-MIN(id) | MIN(id) | MAX(id) |
|
+-----------+-----------------+------------------+------------------+
|
| 5445 | 5444 | 3201777860083756 | 3201777860089200 |
|
+-----------+-----------------+------------------+------------------+
|
ATTENTION: this observation doesn't directly prove that the same behavior is observed in the non-overlap region. An addition study (if needed) may need to be conducted. An easiest way to do this would be by scanning patches and tracts before and after the overlap region.
The algorithm for generating the primary keys
It has been experimentally discovered that a formula for generating contiguous ranges of primary keys of table RunDeepSource has a direct dependency on the corresponding primary key of table DeepCoadd (which itself is calculated from the triplet of: tract,patch,filter). The formulate calculates the start key for a range of deep sources by:
id_min = 2**27 * deepCoaddId
|
This theory is supported by the following observation:
SELECT
|
id/coadd_id as 'divider',COUNT(*)
|
FROM
|
lsst_prod_DC_2013_2.RunDeepSource
|
GROUP BY
|
divider;
|
|
+----------------+----------+
|
| divider | COUNT(*) |
|
+----------------+----------+
|
| 134217728.0000 | 8684893 |
|
| 134217728.0001 | 24277289 |
|
| 134217728.0002 | 22549282 |
|
| 134217728.0003 | 17703275 |
|
| 134217728.0004 | 11786896 |
|
| 134217728.0005 | 6560958 |
|
| 134217728.0006 | 3519636 |
|
| 134217728.0007 | 1742696 |
|
| 134217728.0008 | 814825 |
|
| 134217728.0009 | 320346 |
|
| 134217728.0010 | 99486 |
|
| 134217728.0011 | 22900 |
|
| 134217728.0012 | 4810 |
|
| 134217728.0013 | 856 |
|
+----------------+----------+
|
Apparently the allocation of other keys in a scope of the same coadd was sequential because it happened within the same process while searching for deep sources on the coadd. And this can be seen by:
SELECT POWER(2,27);
|
+-------------+
|
| POWER(2,27) |
|
+-------------+
|
| 134217728 |
|
+-------------+
|
|
SELECT
|
id,coadd_id*134217728,id-coadd_id*134217728 AS 'diff'
|
FROM
|
lsst_prod_DC_2013_2.RunDeepSource
|
WHERE
|
coadd_id IN (
|
SELECT deepCoaddId
|
FROM lsst_prod_DC_2013_2.DeepCoadd
|
WHERE patch='364,0' AND tract=0 AND filterId=0)
|
ORDER BY
|
id;
|
+------------------+--------------------+------+
|
| id | coadd_id*134217728 | diff |
|
+------------------+--------------------+------+
|
| 3201777860083756 | 3201777860083712 | 44 |
|
| 3201777860083757 | 3201777860083712 | 45 |
|
| 3201777860083758 | 3201777860083712 | 46 |
|
| 3201777860083759 | 3201777860083712 | 47 |
|
| 3201777860083760 | 3201777860083712 | 48 |
|
...
|
| 3201777860089196 | 3201777860083712 | 5484 |
|
| 3201777860089197 | 3201777860083712 | 5485 |
|
| 3201777860089198 | 3201777860083712 | 5486 |
|
| 3201777860089199 | 3201777860083712 | 5487 |
|
| 3201777860089200 | 3201777860083712 | 5488 |
|
+------------------+--------------------+------+
|
5445 rows in set (0.01 sec)
|
This leads to the following optimization for eliminating overlapped keys from table RunDeepSource:
- for each value of deepCoaddId (FK: coadd_id) in the overlap area eliminate all entries whose primary keys fall in a range of:
2**27 * deepCoaddId < id < 2**27 * (deepCoaddId + 1)
A clear advantage of this algorithm over a nested query which would be directly matching values of RunDeepSource.coadd_id and DeepCoadd.deepCoaddId is that the optimized removal will be based on an existing index built for PK RunDeepSource.coadd_id.
A minor inconsistency between sites in tables RunDeepSource 
The total number of deep sources found by each processing site on the same coadd seem to disagree. This is seen from the following report for:
patch = '364,0'
|
tract = 0
|
filterId = 0
|
NCSA:
SELECT
|
id,coadd_id*134217728,id-coadd_id*134217728 AS 'diff'
|
FROM
|
gapon_SDRP_Stripe82.RunDeepSource
|
WHERE
|
coadd_id IN (
|
SELECT deepCoaddId
|
FROM gapon_SDRP_Stripe82.DeepCoadd
|
WHERE patch='364,0' AND tract=0 AND filterId=0)
|
ORDER BY
|
id;
|
+------------------+--------------------+------+
|
| id | coadd_id*134217728 | diff |
|
+------------------+--------------------+------+
|
| 3201777860083756 | 3201777860083712 | 44 |
|
| 3201777860083757 | 3201777860083712 | 45 |
|
...
|
| 3201777860089176 | 3201777860083712 | 5464 |
|
| 3201777860089175 | 3201777860083712 | 5463 |
|
| 3201777860089174 | 3201777860083712 | 5462 |
|
+------------------+--------------------+------+
|
IN2P3:
SELECT
|
id,coadd_id*134217728,id-coadd_id*134217728 AS 'diff'
|
FROM
|
lsst_prod_DC_2013_2.RunDeepSource
|
WHERE
|
coadd_id IN (
|
SELECT deepCoaddId
|
FROM lsst_prod_DC_2013_2.DeepCoadd
|
WHERE patch='364,0' AND tract=0 AND filterId=0)
|
ORDER BY
|
id;
|
+------------------+--------------------+------+
|
| id | coadd_id*134217728 | diff |
|
+------------------+--------------------+------+
|
| 3201777860083756 | 3201777860083712 | 44 |
|
| 3201777860083757 | 3201777860083712 | 45 |
|
...
|
| 3201777860089193 | 3201777860083712 | 5481 |
|
| 3201777860089192 | 3201777860083712 | 5480 |
|
| 3201777860089191 | 3201777860083712 | 5479 |
|
+------------------+--------------------+------+
|
The origin of this disagreement could be found in subtle differences of the processing conducted at both sites.
Planning the merge
A trivial merge of this cluster of tables bad on excluding the overlapped region from one (either of a) site (NCSA or IN2P3) is possible. The merge protocol is presented below:
presented below:
- TBD
Analyzing the overlap in table Science_Ccd_Exposure and its dependents
The previously studied structure of the table's primary key suggests that the overlap can be eliminated based on a quadruplet of (run,filterId,camcol,field). Here is the relevant query:
SELECT
|
run,COUNT(*) |
FROM
|
gapon_SDRP_Stripe82.Science_Ccd_Exposure
|
WHERE
|
(run,filterId,camcol,field) IN (SELECT run,filterId,camcol,field FROM lsst_prod_DC_2013_2.Science_Ccd_Exposure) |
GROUP BY |
run;
|
It yields 150 runs in the overlap:
+------+----------+
|
| run | COUNT(*) |
|
+------+----------+
|
| 1755 | 1080 |
|
| 2649 | 810 |
|
| 2650 | 1020 |
|
| 2662 | 1110 |
|
| 2677 | 210 |
|
| 2708 | 1109 |
|
| 2728 | 1080 |
|
| 2768 | 1110 |
|
| 3325 | 1080 |
|
| 3360 | 1079 |
|
| 3384 | 1080 |
|
| 3388 | 1110 |
|
| 3434 | 1080 |
|
| 3437 | 1079 |
|
| 3465 | 1079 |
|
| 4128 | 1080 |
|
| 4145 | 1079 |
|
| 4153 | 1110 |
|
| 4188 | 358 |
|
| 4192 | 1080 |
|
| 4198 | 1080 |
|
| 4203 | 1080 |
|
| 4207 | 1110 |
|
| 4247 | 1109 |
|
| 4253 | 1080 |
|
| 4263 | 1110 |
|
| 4849 | 1110 |
|
| 4858 | 1080 |
|
| 4868 | 1080 |
|
| 4874 | 1080 |
|
| 4895 | 1080 |
|
| 4899 | 197 |
|
| 4917 | 1075 |
|
| 4927 | 1080 |
|
| 4933 | 1080 |
|
| 5052 | 1110 |
|
| 5566 | 1109 |
|
| 5582 | 1109 |
|
| 5590 | 740 |
|
| 5603 | 1080 |
|
| 5619 | 1091 |
|
| 5622 | 1079 |
|
| 5633 | 1108 |
|
| 5637 | 740 |
|
| 5642 | 1110 |
|
| 5646 | 1096 |
|
| 5709 | 1080 |
|
| 5713 | 1075 |
|
| 5744 | 1079 |
|
| 5754 | 1107 |
|
| 5759 | 1059 |
|
| 5765 | 1102 |
|
| 5770 | 1077 |
|
| 5776 | 1080 |
|
| 5781 | 1110 |
|
| 5786 | 1079 |
|
| 5792 | 1079 |
|
| 5800 | 1110 |
|
| 5807 | 1106 |
|
| 5813 | 1080 |
|
| 5820 | 1109 |
|
| 5823 | 1110 |
|
| 5836 | 1110 |
|
| 5853 | 1063 |
|
| 5878 | 1076 |
|
| 5882 | 1066 |
|
| 5895 | 1110 |
|
| 5898 | 1099 |
|
| 5905 | 1081 |
|
| 5918 | 1079 |
|
| 5924 | 1079 |
|
| 6283 | 1080 |
|
| 6287 | 1110 |
|
| 6293 | 570 |
|
| 6314 | 1105 |
|
| 6330 | 1075 |
|
| 6355 | 1110 |
|
| 6362 | 210 |
|
| 6367 | 1110 |
|
| 6373 | 720 |
|
| 6377 | 1080 |
|
| 6383 | 1103 |
|
| 6402 | 1110 |
|
| 6409 | 1076 |
|
| 6414 | 1079 |
|
| 6421 | 1109 |
|
| 6425 | 1079 |
|
| 6430 | 1110 |
|
| 6433 | 1106 |
|
| 6441 | 361 |
|
| 6447 | 1074 |
|
| 6458 | 1077 |
|
| 6461 | 1109 |
|
| 6471 | 149 |
|
| 6474 | 1076 |
|
| 6479 | 1110 |
|
| 6484 | 1080 |
|
| 6501 | 1104 |
|
| 6504 | 1080 |
|
| 6508 | 1102 |
|
| 6513 | 1108 |
|
| 6518 | 1104 |
|
| 6524 | 808 |
|
| 6530 | 1023 |
|
| 6533 | 1108 |
|
| 6537 | 1054 |
|
| 6555 | 1080 |
|
| 6559 | 1072 |
|
| 6564 | 1109 |
|
| 6568 | 1077 |
|
| 6577 | 1079 |
|
| 6580 | 1075 |
|
| 6584 | 1078 |
|
| 6596 | 1076 |
|
| 6600 | 1109 |
|
| 6618 | 1061 |
|
| 6934 | 1080 |
|
| 6955 | 1110 |
|
| 6963 | 1107 |
|
| 6976 | 1080 |
|
| 6982 | 1080 |
|
| 7024 | 1031 |
|
| 7034 | 1107 |
|
| 7038 | 1109 |
|
| 7047 | 987 |
|
| 7051 | 1079 |
|
| 7054 | 1077 |
|
| 7057 | 1108 |
|
| 7080 | 1110 |
|
| 7084 | 1079 |
|
| 7095 | 527 |
|
| 7101 | 1054 |
|
| 7106 | 1067 |
|
| 7112 | 1108 |
|
| 7117 | 1099 |
|
| 7121 | 1079 |
|
| 7124 | 1065 |
|
| 7136 | 1110 |
|
| 7140 | 1080 |
|
| 7145 | 1077 |
|
| 7155 | 1090 |
|
| 7158 | 1077 |
|
| 7161 | 1079 |
|
| 7170 | 1078 |
|
| 7173 | 1088 |
|
| 7177 | 1091 |
|
| 7183 | 419 |
|
| 7195 | 919 |
|
| 7199 | 202 |
|
| 7202 | 1097 |
|
+------+----------+
|
The opposite query yields the same result. This is also illustrated with this report:
SELECT COUNT(*) |
FROM gapon_SDRP_Stripe82.Science_Ccd_Exposure |
WHERE (run,filterId,camcol,field) |
IN |
(SELECT run,filterId,camcol,field FROM lsst_prod_DC_2013_2.Science_Ccd_Exposure); |
+----------+
|
| COUNT(*) |
|
+----------+
|
| 153666 |
|
+----------+
|
SELECT COUNT(*) |
FROM lsst_prod_DC_2013_2.Science_Ccd_Exposure |
WHERE (run,filterId,camcol,field) |
IN |
(SELECT run,filterId,camcol,field FROM gapon_SDRP_Stripe82.Science_Ccd_Exposure); |
+----------+
|
| COUNT(*) |
|
+----------+
|
| 153666 |
|
+----------+
|
For further analysis of the overlap and for optimization of the merge the overlap identifiers (along with their quadruplets) have been captured in the following table:
SELECT * FROM merge_info.ccd_exp_overlap LIMIT 1; |
+----------------------+------+--------+----------+-------+
|
| scienceCcdExposureId | run | camcol | filterId | field |
|
+----------------------+------+--------+----------+-------+
|
| 1755010409 | 1755 | 1 | 0 | 409 |
|
+----------------------+------+--------+----------+-------+
|
SELECT COUNT(*) FROM merge_info.ccd_exp_overlap; |
+----------+
|
| COUNT(*) |
|
+----------+
|
| 153666 |
|
+----------+
|
Observations and conclusions
Based on the nature of the data in these tables, the dependencies between the tables and the previously investigated structure of the primary key of the CCD exposures table, it would seem reasonable to rely upon the previously mentioned quadruplet (or the derived primary key of the table) as a sole disambiguation marker when merging IN2P3 tables into the final catalog.
The structure of the primary key in the dependent table RunDeepForcedSource
It's important to make sure that no primary key conflict will arise when attempting to merge all 10 instances of the table into a single table. Fortunately, it has been experimentally discovered (though no relevant Data Challenge 13 code has been found so far) that the primary keys of the table are computed directly from the CCD exposures' identifiers using the following simplified formula:
id = 2**26 * ccd_exposure_id + delta
|
This is illustrated with the following example:
SELECT POW(2,26); |
+-----------+
|
| POW(2,26) |
|
+-----------+
|
| 67108864 |
|
+-----------+
|
SELECT id,exposure_id,id/exposure_id |
FROM gapon_SDRP_Stripe82.RunDeepForcedSource |
LIMIT 1;
|
+-------------------+-------------+----------------+
|
| id | exposure_id | id/exposure_id |
|
+-------------------+-------------+----------------+
|
| 69327485392650241 | 1033060035 | 67108864.0000 |
|
+-------------------+-------------+----------------+
|
SELECT MIN(id),MAX(id),MAX(id)-MIN(id)+1,COUNT(id) |
FROM gapon_SDRP_Stripe82.RunDeepForcedSource |
WHERE exposure_id=1033060035; |
+-------------------+-------------------+-------------------+-----------+
|
| MIN(id) | MAX(id) | MAX(id)-MIN(id)+1 | COUNT(id) |
|
+-------------------+-------------------+-------------------+-----------+
|
| 69327485392650241 | 69327485392655340 | 5100 | 5100 |
|
+-------------------+-------------------+-------------------+-----------+
|
Other observations and conclusions
- the primary keys of the table were allocated in a contiguous interval whose base directly depends (FK) on the corresponding (PK) identifiers from the exposures table.
- the maximum size of each interval allows to accommodate up to 64 million sources per each CCD exposure, which excludes (even remotely) any possibility of running into a key conflict when merging the contents of 10 tables (minus overlap) into one.
The coverage of entries stored in RunDeepForcedSource in the overlap area across sites
Now as we know that each site processed the same number of exposures in the overlap region, the next important problem to be investigated hereafter is related to a question on how complete was the further search of the forced deep sources on those exposures.
- put aside a potential chance that a site might be doing the partial processing of an image based on certain RA-DEC limits.
In this study we're going to count the total number of the forced deep sources found for each such exposure to make sure each image was processed.
- note we're not requiring that the total number of findings (per image) was an exact match as there might be site-specific subtle differences in the processing algorithms, configurations, hardware configurations (different CPU versions, etc.), order of processing. The general expectation here is to see some form of the general agreement between sites.
In the study we're going to use the u (filterId=0) band.
Counting the total number of the forced deep sources found at both sites.
NCSA:
SELECT COUNT(*) |
FROM gapon_SDRP_Stripe82.RunDeepForcedSource |
WHERE exposure_id IN |
(SELECT scienceCcdExposureId FROM merge_info.ccd_exp_overlap); |
+-----------+
|
| COUNT(*) |
|
+-----------+
|
| 179599164 |
|
+-----------+
|
IN2P3:
SELECT COUNT(*) |
FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource |
WHERE exposure_id IN |
(SELECT scienceCcdExposureId FROM merge_info.ccd_exp_overlap); |
+-----------+
|
| COUNT(*) |
|
+-----------+
|
| 186050567 |
|
+-----------+
|
Per-image comparison of the number of found sources in the overlap region
Create two tables to store intermediate result of the analysis:
CREATE TABLE merge_info.ncsa_num_deep_forced_src_overlap_u ( |
`scienceCcdExposureId` bigint(20) NOT NULL, |
`num` int(11) DEFAULT 0 |
);
|
CREATE TABLE merge_info.in2p3_num_deep_forced_src_overlap_u ( |
`scienceCcdExposureId` bigint(20) NOT NULL, |
`num` int(11) DEFAULT 0 |
);
|
Collects the stats per each site.
NCSA:
INSERT INTO merge_info.ncsa_num_deep_forced_src_overlap_u |
SELECT exposure_id,COUNT(*) |
FROM gapon_SDRP_Stripe82.RunDeepForcedSource WHERE exposure_id IN |
(SELECT scienceCcdExposureId FROM merge_info.ccd_exp_overlap) |
GROUP BY exposure_id ORDER BY exposure_id; |
Query OK, 29090 rows affected (1 min 0.73 sec)
|
Records: 29090 Duplicates: 0 Warnings: 0
|
IN2P3:
INSERT INTO merge_info.in2p3_num_deep_forced_src_overlap_u |
SELECT exposure_id,COUNT(*) |
FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource WHERE exposure_id IN |
(SELECT scienceCcdExposureId FROM merge_info.ccd_exp_overlap) |
GROUP BY exposure_id ORDER BY exposure_id; |
INSERT INTO merge_info.in2p3_num_deep_forced_src_overlap_u
|
SELECT exposure_id,COUNT(*)
|
FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource WHERE exposure_id IN
|
(SELECT scienceCcdExposureId FROM merge_info.ccd_exp_overlap)
|
GROUP BY exposure_id ORDER BY exposure_id;
|
Counting the number of exposures with non-trivial number of found sources:
site | total sources found | total exposures | unique exposures (not in the other site) |
---|---|---|---|
NCSA | 179599164 | 29090 | 223 |
IN2P3 | 186050567 | 29905 | 1038 |
This table illustrates the numbers of runs (per each site) when the differences were found:
site | num runs |
---|---|
NCSA | 39 |
IN2P3 | 144 |
These were the queries used to calculate the numbers:
SELECT run, COUNT(*) |
FROM merge_info.ccd_exp_overlap |
WHERE scienceCcdExposureId IN |
(SELECT scienceCcdExposureId |
FROM ncsa_num_deep_forced_src_overlap_u |
WHERE scienceCcdExposureId NOT IN |
(SELECT scienceCcdExposureId |
FROM in2p3_num_deep_forced_src_overlap_u)) |
GROUP by run |
ORDER BY run; |
|
SELECT run, COUNT(*) |
FROM merge_info.ccd_exp_overlap |
WHERE scienceCcdExposureId IN |
(SELECT scienceCcdExposureId |
FROM in2p3_num_deep_forced_src_overlap_u |
WHERE scienceCcdExposureId NOT IN |
(SELECT scienceCcdExposureId |
FROM ncsa_num_deep_forced_src_overlap_u)) |
GROUP by run |
ORDER BY run; |
HOW TO INTERPRET THIS?
Further investigation on those unique exposure is needed to see to see if (possibly) there is any correlation with RA-DEC.
Planning the merge
Two independent clusters of tables have been identified. Each group can be merged separately from the other one. Intermediate results of the merge will be stored as needed in a special database called merge_info.
MySQL optimization techniques for MyISAM:
https://dev.mysql.com/doc/refman/5.5/en/optimizing-myisam-bulk-data-loading.html
The most relevant suggestion from this document is:
ALTER TABLE tbl_name DISABLE KEYS;
|
<load>
|
ALTER TABLE tbl_name ENABLE KEYS;
|
Q: MySQL server tuning - anything can be done here?
CCD exposures and its dependent metadata tables
As it's been discovered and explained earlier in this ticket, the merge can be safely made based on the exact equality of the primary keys of table Science_Ccd_Exposure in the overlap area. The keys are computed from a quadruplet of columns: (run,filterId,camcol,field). Hence in the proposed merge algorithm the overlapped rows will be first removed from the IN2P3 CCD exposure and its dependent metadata tables (2 of those). The next step will be to ingest the remaining content of those tables into the corresponding tables of the primary NCSA database.
Note that the biggest (both in terms of the total of rows and their size) band-specific tables RunDeepForcedSource will be merged based on RunDeepSource not on the exposures.
coadds, objects and sources
There are 10 instances of this table spread in between 10 databases. Each database represents a filer band (ugriz) and a site (NCSA, IN2P3). Rows of the table directly depend on:
- the corresponding rows of the CCD exposures table
- the corresponding rows of the objects in the RunDeepSource table
According to a previously made study the first dependency (onto exposures) can be safely eliminated because both sites provide the same collection of images in the overlap area, and the corresponding rows have the same identifiers.
The second dependency is more fundamental from the science point of view because it established a relationship between Astrophysical Objects (measured on the deep co-adds) and the corresponding collections of Sources (measured on individual images). Hence, the elimination of the overlapped rows of table RunDeepForcedSource should be made on a chain of:
- RunDeepForcedSource.object_id -> RunDeepSource.deepCoaddId -> DeepCoadd
These are other ideas for a most optimal merge algorithm:
- all 5 NCSA's tables RunDeepForcedSource do not require any modifications. They can be merged as is.
- IN2P3's tables DeepCoadd and RunDeepSource:
- first of all need to be cleaned from the overlapped data (presumably based on patch and track)
- the overlapped object identifiers RunDeepSource.id should be recorded in temporary table as they will be needed on the next stage
- IN2P3's tables RunDeepForcedSource:
- should be cleaned from all rows leading to the previously recorded objects in the overlap area
- then the contents of the tables can be either dumped into CSV/TSV or merged with the main table RunDeepForcedSource at NCSA.
Merging CCD exposures and its dependent metadata tables
status | step | queries, procedures, etc. | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
initialize and load main database gapon_SDRP_Stripe82 with exposures and relevant metadata processed at NCSA |
MySQL dump/load | ||||||||||||||||||||||||||||||||||
![]() |
initialize and load the second database lsst_prod_DC_2013_2 with exposures and relevant metadata processed in IN2P3 |
MySQL dump/load | ||||||||||||||||||||||||||||||||||
![]() |
create table merge_info.ccd_exp_overlap for primary keys and quadruplets in the overlap area |
|
||||||||||||||||||||||||||||||||||
![]() |
records the primary keys and quadruplets into merge_info.ccd_exp_overlap |
|
||||||||||||||||||||||||||||||||||
![]() |
delete the primary keys and quadruplets of the overlap from the IN2P3 table lsst_prod_DC_2013_2.Science_Ccd_Exposure |
|
||||||||||||||||||||||||||||||||||
![]() |
eliminate dependent rows from the IN2P3's CCD metadata table lsst_prod_DC_2013_2.Science_Ccd_Exposure_Metadata |
|
||||||||||||||||||||||||||||||||||
![]() |
eliminate dependent rows from the IN2P3's CCD metadata table lsst_prod_DC_2013_2.Science_Ccd_Exposure_To_Htm10 |
|
||||||||||||||||||||||||||||||||||
![]() |
ingest the remaining contents of lsst_prod_DC_2013_2.Science_Ccd_Exposure into gapon_SDRP_Stripe82.Science_Ccd_Exposure |
|
||||||||||||||||||||||||||||||||||
![]() |
ingest the remaining contents of lsst_prod_DC_2013_2.Science_Ccd_Exposure_Metadata into gapon_SDRP_Stripe82.Science_Ccd_Exposure_Metadata |
|
||||||||||||||||||||||||||||||||||
![]() |
ingest the remaining contents of lsst_prod_DC_2013_2.Science_Ccd_Exposure_To_Htm10 into gapon_SDRP_Stripe82.Science_Ccd_Exposure_To_Htm10 |
Query OK, 0 rows affected (0.19 sec)
Query OK, 26168990 rows affected (7.02 sec)
Query OK, 0 rows affected (2 min 0.93 sec)
|
Merging RunDeepForcedSource (FAILED ATTEMPT)
These are important notes on the merge:
- the NCSA's u (filterId=0) frequency band is already preloaded into database gapon_SDRP_Stripe82
- the remaining 4 NCSA's bands will be loaded into the database w/o any filtering
- all 5 IN2P3's bands will be loaded with a filter which excludes sources related to the overlapped exposures (as discussed earlier in the corresponding analysis section)
- a list of those overlapped exposures is preloaded into table merge_info.ccd_exp_overlap.
The log of the operation is shown below:
status | step | queries, procedures, etc. | |||||
---|---|---|---|---|---|---|---|
![]() |
loading NCSA's u (filterId=0) band | loaded earlier when initializing the database | |||||
![]() |
loading IN2P3's u (filterId=0) band |
|
The operation was CANCELLED after 2+ days due to poor performance of the distributed file system for the small block random I/O. The postmortem analysis is
presented below.
The duration of the operation before it was interrupted (195811 seconds)
The duration in seconds (see column Time) was about 54 hours as shown below:
SHOW PROCESSLIST;
|
+----+------+-----------+---------------------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info |
|
+----+------+-----------+---------------------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+
|
| 55 | root | localhost | gapon_SDRP_Stripe82 | Query | 195811 | Sending data | INSERT INTO gapon_SDRP_Stripe82.RunDeepForcedSource SELECT * FROM lsst_prod_dedupe_byfilter_u.RunDee |
|
+----+------+-----------+---------------------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+
|
The change in the table & index sizes
The table stats (number of rows, data & index sizes) at the beginning of the operation:
SHOW TABLE STATUS; |
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time |
|
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
| RunDeepForcedSource | MyISAM | 10 | Fixed | 1729599791 | 395 | 683191917445 | 111182615800709119 | 190732060672 | 0 | NULL | 2016-08-10 17:39:27 | 2016-08-31 00:16:31 | 2016-08-12 13:17:23 |
|
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
And after it was interrupted:
SHOW TABLE STATUS; |
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time |
|
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
| RunDeepForcedSource | MyISAM | 10 | Fixed | 2197365176 | 395 | 867959244520 | 111182615800709119 | 262239522816 | 0 | NULL | 2016-08-10 17:39:27 | 2016-08-31 00:27:23 | 2016-08-12 13:17:23 |
|
+-------------------------------+--------+---------+------------+------------+----------------+--------------+--------------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+
|
The change was:
rows | data size (bytes) | index size (bytes) | |
---|---|---|---|
before | 1729599791 | 683191917445 | 190732060672 |
after | 2197365176 | 867959244520 | 262239522816 |
The estimated performance:
- rows: (2197365176 - 1729599791) / 195811 = 2389 rows/s
- table data: (867959244520 - 683191917445) / 195811 = 943600 bytes/sec = 1 MB/s
- index data: (262239522816 - 190732060672 ) / 195811 = 365186 bytes/sec = 0.36 MB/s
NOTE: most likely the performance was NOT LINEAR. A significant slowdown of the file size increase was observed during the merge. And it could be dominated by the index rebuild.
File system performance analysis
It turns out the operation was blocked by the high latency and low direct I/O performance of an OpenStack volume attached to the VM. This was seen by:
% iostat -m 10
|
Linux 3.10.0-327.13.1.el7.x86_64 (lsst-gapon-0.os.ncsa.edu) 08/31/2016 _x86_64_ (8 CPU)
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
3.52 0.00 0.50 2.75 0.00 93.23
|
|
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
|
vda 0.12 0.00 0.00 6137 539
|
vdb 359.15 14.64 6.04 26630294 10981068
|
vdc 4.90 0.01 2.06 13961 3744667
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
1.89 0.00 0.53 10.10 0.01 87.47
|
|
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
|
vda 0.30 0.00 0.00 0 0
|
vdb 661.10 18.83 0.01 188 0
|
vdc 0.00 0.00 0.00 0 0
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
1.01 0.00 0.46 17.77 0.01 80.74
|
|
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
|
vda 0.00 0.00 0.00 0 0
|
vdb 7900.00 12.52 43.49 125 434
|
vdc 0.00 0.00 0.00 0 0
|
And this was a result reported by iozone under ideal conditions when no other I/O activity was happening on that machine:
% iozone -s 100000 -r 16 -I
|
File size set to 100000 kB
|
Record Size 16 kB
|
O_DIRECT feature enabled
|
Command line used: iozone -s 100000 -r 16 -I
|
Output is in kBytes/sec
|
Time Resolution = 0.000001 seconds.
|
Processor cache size set to 1024 kBytes.
|
Processor cache line size set to 32 bytes.
|
File stride size set to 17 * record size.
|
random random bkwd record stride
|
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
|
100000 16 34260 74921 62643 65426 37339 73873 70515 78902 69543 1409936 2021713 4858634 5284841
|
The random I/O seemed to be on the order of 34 to 62 MB/s.
Conclusions (on the poor performance)
Most likely the main factor limiting the performance was the table index rebuild process which required rebalancing huge B-Trees far exceeding the machine's memory wile the underlying file system had inadequate (for a problem of this scale) performance for the random (direct) I/O operations with MySQL's 16 KB pages.
Further options will be investigated in the next comment.
Planning more efficient merge of the forced deep sources
Three options have been discussed:
- making another attempt to merge two tables with disabled keys using the same hardware setup (OpenStack and NCSA)
- measure the performance of the operation, make observations on the file system behavior
- then try re-enabling the keys and study the performance of the index rebuild operation
- develop an application which will produce a collection of CSV/TSV files compatible with the Qserv database loader/partitioner application
- the application should chunk its output into multiple files (one file per run)
- to speed up the operation 10 OpenStack VMs and 10 separate storage partitions could be set up
- investigate the performance of the index rebuild on the new Qserv hardware in IN2P3 (NVMe, SSD based large memory machine *ccqserv150 *)
Merging RunDeepForcedSource w/ DISABLE KEYS
This is the first option discussed in the previous comment. Note the following:
- the DISABLE KEYS request won't affect the PRIMARY index of the table
- the performance of the INSERT ... SELECT ... operations can be significantly speed up by increasing a value of the following global variable: http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_bulk_insert_buffer_size
The log of the operation is shown below:
status | step | queries, procedures, etc. | |||||||
---|---|---|---|---|---|---|---|---|---|
![]() |
saving the previous state of table gapon_SDRP_Stripe82.RunDeepForcedSource |
|
|||||||
![]() |
loading NCSA's u band into the newly created table from a MySQL dump file with a command which prevents keys from being enabled in the end of the loading. |
|
|||||||
![]() |
increased the size of of MyISAM's special tree-like cache from 8 MB to 8 GB (a half of the machine's memory) to optimize bulk inserts into non-empty tables as per: http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_bulk_insert_buffer_size |
|
|||||||
![]() |
delete all indexes on table gapon_SDRP_Stripe82.RunDeepForcedSource![]() |
|
|||||||
![]() |
merge the de-duplicated rows of table RunDeepForcedSource of the IN2P3's u band into table gapon_SDRP_Stripe82.RunDeepForcedSource. | See a recipe from the previous (FAILED) merging effort |
Sprint | DB_F16_7, DB_F16_8 [ 232, 233 ] | DB_F16_7, DB_F16_8, DB_F16_9 [ 232, 233, 249 ] |
Rank | Ranked higher |
Merging and de-duplicating DeepCoadd and RunDeepSource
These tables needs to be merged and de-duplicated before attempting to de-duplicate 5 IN2P3's tables RunDeepForcedSource. The effort is based on the following (earlier explained in this ticket) dependency between tables:
- RunDeepForcedSource.objectId -> id:RunDeepSource.coadd_id -> deepCoaddId:DeepCoadd
We shall also increase the MySQL key buffer size because some of the operations used later on will greatly benefit from this. The parameter is explained at http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_key_buffer_size. Set it to 4 GB as per Serge's suggestion at https://github.com/lsst-dm/db_tests_summer15/blob/master/sql/dump_ForcedSource.sql then restart the server.
% cat /etc/my.cnf.d/server.cnf |
..
|
key_buffer_size = 4294967296
|
..
|
% systemctl restart mariadb.service
|
Saving the overlap info for DeepCoadd
Here we're going to save the overlap identifiers in intermediate tables.
NCSA:
CREATE TABLE merge_info.ncsa_DeepCoadd_overlap ( |
`deepCoaddId` bigint(20) NOT NULL, |
`tract` int(11) NOT NULL, |
`patch` char(16) NOT NULL, |
`filterId` tinyint(4) NOT NULL, |
PRIMARY KEY (`deepCoaddId`) |
) ENGINE=MyISAM;
|
|
INSERT INTO merge_info.ncsa_DeepCoadd_overlap |
SELECT deepCoaddId,tract,patch,filterId |
FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch,filterId) IN |
(SELECT tract,patch,filterId FROM lsst_prod_DC_2013_2.DeepCoadd); |
Query OK, 1380 rows affected (0.10 sec)
|
Records: 1380 Duplicates: 0 Warnings: 0
|
IN2P3:
CREATE TABLE merge_info.in2p3_DeepCoadd_overlap ( |
`deepCoaddId` bigint(20) NOT NULL, |
`tract` int(11) NOT NULL, |
`patch` char(16) NOT NULL, |
`filterId` tinyint(4) NOT NULL, |
PRIMARY KEY (`deepCoaddId`) |
) ENGINE=MyISAM;
|
|
INSERT INTO merge_info.in2p3_DeepCoadd_overlap |
SELECT deepCoaddId,tract,patch,filterId |
FROM lsst_prod_DC_2013_2.DeepCoadd |
WHERE (tract,patch,filterId) IN |
(SELECT tract,patch,filterId FROM gapon_SDRP_Stripe82.DeepCoadd); |
Query OK, 1380 rows affected (41.85 sec)
|
Records: 1380 Duplicates: 0 Warnings: 0
|
Observations:
- both tables have identical entries
- the number of rows in each table is rather small: 1380
- the table size is: 85560
- the index size: 23552
Saving the overlap info for RunDeepSource
Estimating the total number of Objects in the overlap area
NCSA:
SELECT
|
COUNT(id) |
FROM
|
gapon_SDRP_Stripe82.RunDeepSource `rds`,
|
merge_info.ncsa_DeepCoadd_overlap `overlap`
|
WHERE
|
rds.coadd_id = overlap.deepCoaddId;
|
+-----------+
|
| COUNT(id) |
|
+-----------+
|
| 9991529 |
|
+-----------+
|
1 row in set (1 min 43.00 sec)
|
IN2P3:
SELECT
|
COUNT(id) |
FROM
|
lsst_prod_DC_2013_2.RunDeepSource `rds`,
|
merge_info.in2p3_DeepCoadd_overlap `overlap`
|
WHERE
|
rds.coadd_id = overlap.deepCoaddId;
|
+-----------+
|
| COUNT(id) |
|
+-----------+
|
| 9993373 |
|
+-----------+
|
1 row in set (2 min 46.84 sec)
|
Recording the identifiers in the bookkeeping tables for both sites
NCSA:
CREATE TABLE merge_info.ncsa_RunDeepSource_overlap ( |
`id` bigint(20) NOT NULL, |
`coadd_id` bigint(20) NOT NULL, |
PRIMARY KEY (`id`) |
) ENGINE=MyISAM;
|
|
INSERT INTO merge_info.ncsa_RunDeepSource_overlap |
SELECT id, coadd_id |
FROM
|
gapon_SDRP_Stripe82.RunDeepSource `rds`,
|
merge_info.ncsa_DeepCoadd_overlap `overlap`
|
WHERE
|
rds.coadd_id = overlap.deepCoaddId;
|
Query OK, 9991529 rows affected (25.07 sec)
|
Records: 9991529 Duplicates: 0 Warnings: 0
|
IN2P3:
CREATE TABLE merge_info.in2p3_RunDeepSource_overlap ( |
`id` bigint(20) NOT NULL, |
`coadd_id` bigint(20) NOT NULL, |
PRIMARY KEY (`id`) |
) ENGINE=MyISAM;
|
|
INSERT INTO merge_info.in2p3_RunDeepSource_overlap |
SELECT id, coadd_id |
FROM
|
lsst_prod_DC_2013_2.RunDeepSource `rds`,
|
merge_info.in2p3_DeepCoadd_overlap `overlap`
|
WHERE
|
rds.coadd_id = overlap.deepCoaddId;
|
Query OK, 9993373 rows affected (32.20 sec)
|
Records: 9993373 Duplicates: 0 Warnings: 0
|
Observations
- the number of entries in the tables are almost identical
- the number of rows in each table is: 9993373 (10^7)
- the table size is: 169887341 (170 MB)
- the index size: 144075776 (144 MB)
It means that with a proper key buffer size (as mentioned at the start of this comment) the whole table index will fit into memory and it can be used for eliminated overlaps from the largest tables RunDeepForcedSource.
Deduplicating DeepCoadd (and its metadata tables) of IN2P3
This is an intermediate step before producing the fully merged tables. The goal is to remove the overlap in tables:
- lsst_prod_DC_2013_2.DeepCoadd
- lsst_prod_DC_2013_2.DeepCoadd_Metadata
- lsst_prod_DC_2013_2.DeepCoadd_To_Htm10
The resulting tables will be residing within the same database:
CREATE TABLE lsst_prod_DC_2013_2.DeepCoadd_dedup |
LIKE lsst_prod_DC_2013_2.DeepCoadd; |
ALTER TABLE lsst_prod_DC_2013_2.DeepCoadd_dedup DISABLE KEYS; |
|
CREATE TABLE lsst_prod_DC_2013_2.DeepCoadd_Metadata_dedup |
LIKE lsst_prod_DC_2013_2.DeepCoadd_Metadata; |
ALTER TABLE lsst_prod_DC_2013_2.DeepCoadd_Metadata_dedup DISABLE KEYS; |
|
CREATE TABLE lsst_prod_DC_2013_2.DeepCoadd_To_Htm10_dedup |
LIKE lsst_prod_DC_2013_2.DeepCoadd_To_Htm10; |
Dumping catalogs from a MySQL server at IN2P3
The location of the dumps
The dump files will be placed at the following file system:
ccosvms0070.in2p3.fr
Filesystem Size Used Avail Use% Mounted on
Each database will be dumped into the corresponding sub-folder:
lsst_prod_DC_2013_2
lsst_prod_dedupe_byfilter_g
lsst_prod_dedupe_byfilter_i
lsst_prod_dedupe_byfilter_r
lsst_prod_dedupe_byfilter_u
lsst_prod_dedupe_byfilter_z
Installed PIGZ (parallel GZIP) to speed up the compression phase of the dumps
Compiled and installed the tool at:
/afs/in2p3.fr/home/g/gapon//soft/pigz-2.3.3/
Added this folder to PATH
Planning for parallel data extraction
The procedure will run in parallel by launching an independent instance of the mysqldump command per each table. The following non-trivial tables have been identified:
SELECT
TABLE_SCHEMA,
TABLE_NAME,
TABLE_ROWS,
FROM
information_schema.tables
WHERE
TABLE_ROWS>0;
+-----------------------------+-------------------------------+------------+----------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH_MB |
+-----------------------------+-------------------------------+------------+----------------+
| lsst_prod_DC_2013_2 | DeepCoadd | 13680 | 5 |
| lsst_prod_DC_2013_2 | DeepCoadd_Metadata | 3022797 | 133 |
| lsst_prod_DC_2013_2 | DeepCoadd_To_Htm10 | 324020 | 4 |
| lsst_prod_DC_2013_2 | Filter | 6 | 0 |
| lsst_prod_DC_2013_2 | LeapSeconds | 39 | 0 |
| lsst_prod_DC_2013_2 | RunDeepSource | 98088148 | 54733 |
| lsst_prod_DC_2013_2 | Science_Ccd_Exposure | 1543108 | 704 |
| lsst_prod_DC_2013_2 | Science_Ccd_Exposure_Metadata | 180543635 | 5592 |
| lsst_prod_DC_2013_2 | Science_Ccd_Exposure_To_Htm10 | 29107193 | 378 |
| lsst_prod_DC_2013_2 | ZZZ_Db_Description | 1 | 0 |
| lsst_prod_dedupe_byfilter_g | RunDeepForcedSource | 1947408074 | 769226 |
| lsst_prod_dedupe_byfilter_g | tmp_dedupe_g | 166970338 | 10686 |
| lsst_prod_dedupe_byfilter_i | RunDeepForcedSource | 1950338024 | 770384 |
| lsst_prod_dedupe_byfilter_i | tmp_dedupe_i | 167273430 | 10705 |
| lsst_prod_dedupe_byfilter_r | RunDeepForcedSource | 1950357798 | 770391 |
| lsst_prod_dedupe_byfilter_r | tmp_dedupe_r | 128661060 | 8234 |
| lsst_prod_dedupe_byfilter_u | RunDeepForcedSource | 1897928494 | 749682 |
| lsst_prod_dedupe_byfilter_u | tmp_dedupe_u | 160239581 | 10255 |
| lsst_prod_dedupe_byfilter_z | RunDeepForcedSource | 1946839240 | 769001 |
| lsst_prod_dedupe_byfilter_z | tmp_dedupe_z | 166920433 | 10683 |
+-----------------------------+-------------------------------+------------+----------------+
The list of operations to be run in parallel:
Monitoring the performance of the operation
The effective data reduction rate of 3.2 has been established by two independent techniques. The first one was monitoring the in-bound and outbound network traffic to/from a node where the dumps were being run using a custom script. The script periodically (every 5 seconds) samples counters of the network interface and reports the overall progress. The script location and its usage illustrated below:
Hostname: ccosvms0070 IP address: 134.158.239.70
Interface: eth0
Unit: MB/s
+--------+--------+------------+--------------------------+
| RX | TX | TIMESTAMP | LOCAL TIME |
+--------+--------+------------+--------------------------+
| 11.9 | 3.7 | 1469828109 | Fri Jul 29 23:35:09 2016 |
| 11.9 | 3.7 | 1469828114 | Fri Jul 29 23:35:14 2016 |
| 11.9 | 3.4 | 1469828119 | Fri Jul 29 23:35:19 2016 |
| 11.9 | 3.7 | 1469828124 | Fri Jul 29 23:35:24 2016 |
..
The second technique was based on dumping one table, storing it in the compressed format and comparing the resulting file size with the amount of data in the corresponding table as reported earlier. Table lsst_prod_DC_2013_2.Science_Ccd_Exposure_Metadata was used as a reference:
The 'DATA_LENGTH' counter reported earlier for the table was 5592 MB. Hence, we have the same deflation factor of 3.2 as reported by the network monitoring tool.
Based on the total amount of data as reported by:
SELECT
FROM
information_schema.tables;
+-----------------+----------------------+
| SUM(TABLE_ROWS) | TOTAL_DATA_LENGTH_MB |
+-----------------+----------------------+
| 10795579099 | 3940798 |
+-----------------+----------------------+
Estimates:
dump completed: lsst_prod_DC_2013_2
/sps/lsst/data/gapon(0)>ls -al lsst_prod_DC_2013_2/
total 44301120
drwxr-sr-x 2 gapon lsst 32768 Jul 29 21:04 .
drwxr-sr-x 8 gapon lsst 32768 Jul 29 21:58 ..
-rw-r--r-- 1 gapon lsst 30687846 Jul 29 21:04 DeepCoadd_Metadata.sql.gz
-rw-r--r-- 1 gapon lsst 2346437 Jul 29 21:04 DeepCoadd.sql.gz
-rw-r--r-- 1 gapon lsst 908926 Jul 29 21:04 DeepCoadd_To_Htm10.sql.gz
-rw-r--r-- 1 gapon lsst 787 Jul 29 21:04 Filter.sql.gz
-rw-r--r-- 1 gapon lsst 1637 Jul 29 21:04 LeapSeconds.sql.gz
-rw-r--r-- 1 gapon lsst 42934514253 Jul 30 13:27 RunDeepSource.sql.gz
-rw-r--r-- 1 gapon lsst 1789068690 Jul 29 22:01 Science_Ccd_Exposure_Metadata.sql.gz
-rw-r--r-- 1 gapon lsst 527090153 Jul 29 21:06 Science_Ccd_Exposure.sql.gz
-rw-r--r-- 1 gapon lsst 79454956 Jul 29 21:05 Science_Ccd_Exposure_To_Htm10.sql.gz
-rw-r--r-- 1 gapon lsst 745 Jul 29 21:04 ZZZ_Db_Description.sql.gz
Failed dumps for the forced deep source database
The dump of the large tables from all 5 deep forced source catalogs failed approximately 2 days after it began. All 5 outstanding dumps were killed at the same time. Here is what was left in the folders:
>ls -alh lsst_prod_dedupe_byfilter_*/
lsst_prod_dedupe_byfilter_g/:
-rw-r--r-- 1 gapon lsst 190G Aug 1 21:03 RunDeepForcedSource.sql.gz
-rw-r--r-- 1 gapon lsst 2.4G Jul 30 00:10 tmp_dedupe_g.sql.gz
lsst_prod_dedupe_byfilter_i/:
-rw-r--r-- 1 gapon lsst 192G Aug 1 21:03 RunDeepForcedSource.sql.gz
-rw-r--r-- 1 gapon lsst 2.4G Jul 30 00:14 tmp_dedupe_i.sql.gz
lsst_prod_dedupe_byfilter_r/:
-rw-r--r-- 1 gapon lsst 191G Aug 1 21:03 RunDeepForcedSource.sql.gz
-rw-r--r-- 1 gapon lsst 1.8G Jul 29 23:37 tmp_dedupe_r.sql.gz
lsst_prod_dedupe_byfilter_u/:
-rw-r--r-- 1 gapon lsst 186G Aug 1 21:03 RunDeepForcedSource.sql.gz
-rw-r--r-- 1 gapon lsst 2.3G Jul 30 00:08 tmp_dedupe_u.sql.gz
lsst_prod_dedupe_byfilter_z/:
-rw-r--r-- 1 gapon lsst 188G Aug 1 21:03 RunDeepForcedSource.sql.gz
-rw-r--r-- 1 gapon lsst 2.4G Jul 30 00:13 tmp_dedupe_z.sql.gz
UPDATE (2016-08-01): Resubmitted the operation for database lsst_prod_dedupe_byfilter_z. The main rationales for taking a snapshot of a single database are:
The operation successfully completed in 30 hours:
Access: 2016-08-02 07:47:05.902134000 +0200
Modify: 2016-08-03 12:39:28.288670022 +0200
Change: 2016-08-03 12:39:28.288670022 +0200
OBSERVATIONS: the aggregate network I/O during this (single dump) operation was similar to the case of 5 simultaneous dumps. It means running multiple dumps in parallel will make no sense.
Status of the operation