Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Story Points:6
-
Epic Link:
-
Sprint:DB_S17_2, DB_S17_4, DB_S17_5
-
Team:Data Access and Database
Description
Currently, all catalog data for the NCSA/IN2P3 overlap region of the Stripe 82 processing in the PDAC has been taken from the NCSA side of the processing. However, the NCSA side of the processing does not include data for all patches on its "outer" boundary of the overlap region (and same is true for IN2P3 processing with respect to the the opposite overlap boundary).
The catalog merge will need to be redone, with patch division moved away from either of the overlap region boundaries in order to avoid "holes".
Based on a layout of patches (see a PDF document attached by this tickets) the new boundary should be laid between 375 and 376.
Attachments
Attachments
- patch_layout_draft.pdf
- 157 kB
Issue Links
- is triggered by
-
DM-7053 Assemble a complete database with S13 DRP catalogs
- Done
- is triggering
-
DM-10596 PDAC: merge calexps and coadds for imgserv for database sdss_stripe82_01
- Done
- relates to
-
DM-8239 PDAC: objectId 3219370448785419 present in test database, but missing in PDAC Qserv
- Done
- mentioned in
-
Page Loading...
Activity
Defining a suitable boundary for the merge
According to S13 Testing Plan the S13 DRP dataset was split into two regions with an overalap used for cross-site verification:
- NCSA: -40< R.A. < +10
- IN2P3: +5 < R.A. < +55
The overlap area cover RA range of +5 to +10, and includes 276 unique patches in tract 0, which is demonstrated by:
SELECT DISTINCT tract,patch |
FROM |
gapon_SDRP_Stripe82.DeepCoadd
|
WHERE |
(tract,patch) IN (SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) |
ORDER BY patch; |
+-------+--------+
|
| tract | patch |
|
+-------+--------+
|
| 0 | 364,0 |
|
| 0 | 364,1 |
|
| 0 | 364,10 |
|
| 0 | 364,11 |
|
| 0 | 364,2 |
|
| 0 | 364,3 |
|
| 0 | 364,4 |
|
| 0 | 364,5 |
|
| 0 | 364,6 |
|
| 0 | 364,7 |
|
| 0 | 364,8 |
|
| 0 | 364,9 |
|
| 0 | 365,0 |
|
| 0 | 365,1 |
|
| 0 | 365,10 |
|
| 0 | 365,11 |
|
| 0 | 365,2 |
|
| 0 | 365,3 |
|
| 0 | 365,4 |
|
| 0 | 365,5 |
|
| 0 | 365,6 |
|
| 0 | 365,7 |
|
| 0 | 365,8 |
|
| 0 | 365,9 |
|
| 0 | 366,0 |
|
| 0 | 366,1 |
|
| 0 | 366,10 |
|
| 0 | 366,11 |
|
| 0 | 366,2 |
|
| 0 | 366,3 |
|
| 0 | 366,4 |
|
| 0 | 366,5 |
|
| 0 | 366,6 |
|
| 0 | 366,7 |
|
| 0 | 366,8 |
|
| 0 | 366,9 |
|
| 0 | 367,0 |
|
| 0 | 367,1 |
|
| 0 | 367,10 |
|
| 0 | 367,11 |
|
| 0 | 367,2 |
|
| 0 | 367,3 |
|
| 0 | 367,4 |
|
| 0 | 367,5 |
|
| 0 | 367,6 |
|
| 0 | 367,7 |
|
| 0 | 367,8 |
|
| 0 | 367,9 |
|
| 0 | 368,0 |
|
| 0 | 368,1 |
|
| 0 | 368,10 |
|
| 0 | 368,11 |
|
| 0 | 368,2 |
|
| 0 | 368,3 |
|
| 0 | 368,4 |
|
| 0 | 368,5 |
|
| 0 | 368,6 |
|
| 0 | 368,7 |
|
| 0 | 368,8 |
|
| 0 | 368,9 |
|
| 0 | 369,0 |
|
| 0 | 369,1 |
|
| 0 | 369,10 |
|
| 0 | 369,11 |
|
| 0 | 369,2 |
|
| 0 | 369,3 |
|
| 0 | 369,4 |
|
| 0 | 369,5 |
|
| 0 | 369,6 |
|
| 0 | 369,7 |
|
| 0 | 369,8 |
|
| 0 | 369,9 |
|
| 0 | 370,0 |
|
| 0 | 370,1 |
|
| 0 | 370,10 |
|
| 0 | 370,11 |
|
| 0 | 370,2 |
|
| 0 | 370,3 |
|
| 0 | 370,4 |
|
| 0 | 370,5 |
|
| 0 | 370,6 |
|
| 0 | 370,7 |
|
| 0 | 370,8 |
|
| 0 | 370,9 |
|
| 0 | 371,0 |
|
| 0 | 371,1 |
|
| 0 | 371,10 |
|
| 0 | 371,11 |
|
| 0 | 371,2 |
|
| 0 | 371,3 |
|
| 0 | 371,4 |
|
| 0 | 371,5 |
|
| 0 | 371,6 |
|
| 0 | 371,7 |
|
| 0 | 371,8 |
|
| 0 | 371,9 |
|
| 0 | 372,0 |
|
| 0 | 372,1 |
|
| 0 | 372,10 |
|
| 0 | 372,11 |
|
| 0 | 372,2 |
|
| 0 | 372,3 |
|
| 0 | 372,4 |
|
| 0 | 372,5 |
|
| 0 | 372,6 |
|
| 0 | 372,7 |
|
| 0 | 372,8 |
|
| 0 | 372,9 |
|
| 0 | 373,0 |
|
| 0 | 373,1 |
|
| 0 | 373,10 |
|
| 0 | 373,11 |
|
| 0 | 373,2 |
|
| 0 | 373,3 |
|
| 0 | 373,4 |
|
| 0 | 373,5 |
|
| 0 | 373,6 |
|
| 0 | 373,7 |
|
| 0 | 373,8 |
|
| 0 | 373,9 |
|
| 0 | 374,0 |
|
| 0 | 374,1 |
|
| 0 | 374,10 |
|
| 0 | 374,11 |
|
| 0 | 374,2 |
|
| 0 | 374,3 |
|
| 0 | 374,4 |
|
| 0 | 374,5 |
|
| 0 | 374,6 |
|
| 0 | 374,7 |
|
| 0 | 374,8 |
|
| 0 | 374,9 |
|
| 0 | 375,0 |
|
| 0 | 375,1 |
|
| 0 | 375,10 |
|
| 0 | 375,11 |
|
| 0 | 375,2 |
|
| 0 | 375,3 |
|
| 0 | 375,4 |
|
| 0 | 375,5 |
|
| 0 | 375,6 |
|
| 0 | 375,7 |
|
| 0 | 375,8 |
|
| 0 | 375,9 |
|
| 0 | 376,0 |
|
| 0 | 376,1 |
|
| 0 | 376,10 |
|
| 0 | 376,11 |
|
| 0 | 376,2 |
|
| 0 | 376,3 |
|
| 0 | 376,4 |
|
| 0 | 376,5 |
|
| 0 | 376,6 |
|
| 0 | 376,7 |
|
| 0 | 376,8 |
|
| 0 | 376,9 |
|
| 0 | 377,0 |
|
| 0 | 377,1 |
|
| 0 | 377,10 |
|
| 0 | 377,11 |
|
| 0 | 377,2 |
|
| 0 | 377,3 |
|
| 0 | 377,4 |
|
| 0 | 377,5 |
|
| 0 | 377,6 |
|
| 0 | 377,7 |
|
| 0 | 377,8 |
|
| 0 | 377,9 |
|
| 0 | 378,0 |
|
| 0 | 378,1 |
|
| 0 | 378,10 |
|
| 0 | 378,11 |
|
| 0 | 378,2 |
|
| 0 | 378,3 |
|
| 0 | 378,4 |
|
| 0 | 378,5 |
|
| 0 | 378,6 |
|
| 0 | 378,7 |
|
| 0 | 378,8 |
|
| 0 | 378,9 |
|
| 0 | 379,0 |
|
| 0 | 379,1 |
|
| 0 | 379,10 |
|
| 0 | 379,11 |
|
| 0 | 379,2 |
|
| 0 | 379,3 |
|
| 0 | 379,4 |
|
| 0 | 379,5 |
|
| 0 | 379,6 |
|
| 0 | 379,7 |
|
| 0 | 379,8 |
|
| 0 | 379,9 |
|
| 0 | 380,0 |
|
| 0 | 380,1 |
|
| 0 | 380,10 |
|
| 0 | 380,11 |
|
| 0 | 380,2 |
|
| 0 | 380,3 |
|
| 0 | 380,4 |
|
| 0 | 380,5 |
|
| 0 | 380,6 |
|
| 0 | 380,7 |
|
| 0 | 380,8 |
|
| 0 | 380,9 |
|
| 0 | 381,0 |
|
| 0 | 381,1 |
|
| 0 | 381,10 |
|
| 0 | 381,11 |
|
| 0 | 381,2 |
|
| 0 | 381,3 |
|
| 0 | 381,4 |
|
| 0 | 381,5 |
|
| 0 | 381,6 |
|
| 0 | 381,7 |
|
| 0 | 381,8 |
|
| 0 | 381,9 |
|
| 0 | 382,0 |
|
| 0 | 382,1 |
|
| 0 | 382,10 |
|
| 0 | 382,11 |
|
| 0 | 382,2 |
|
| 0 | 382,3 |
|
| 0 | 382,4 |
|
| 0 | 382,5 |
|
| 0 | 382,6 |
|
| 0 | 382,7 |
|
| 0 | 382,8 |
|
| 0 | 382,9 |
|
| 0 | 383,0 |
|
| 0 | 383,1 |
|
| 0 | 383,10 |
|
| 0 | 383,11 |
|
| 0 | 383,2 |
|
| 0 | 383,3 |
|
| 0 | 383,4 |
|
| 0 | 383,5 |
|
| 0 | 383,6 |
|
| 0 | 383,7 |
|
| 0 | 383,8 |
|
| 0 | 383,9 |
|
| 0 | 384,0 |
|
| 0 | 384,1 |
|
| 0 | 384,10 |
|
| 0 | 384,11 |
|
| 0 | 384,2 |
|
| 0 | 384,3 |
|
| 0 | 384,4 |
|
| 0 | 384,5 |
|
| 0 | 384,6 |
|
| 0 | 384,7 |
|
| 0 | 384,8 |
|
| 0 | 384,9 |
|
| 0 | 385,0 |
|
| 0 | 385,1 |
|
| 0 | 385,10 |
|
| 0 | 385,11 |
|
| 0 | 385,2 |
|
| 0 | 385,3 |
|
| 0 | 385,4 |
|
| 0 | 385,5 |
|
| 0 | 385,6 |
|
| 0 | 385,7 |
|
| 0 | 385,8 |
|
| 0 | 385,9 |
|
| 0 | 386,0 |
|
| 0 | 386,1 |
|
| 0 | 386,10 |
|
| 0 | 386,11 |
|
| 0 | 386,2 |
|
| 0 | 386,3 |
|
| 0 | 386,4 |
|
| 0 | 386,5 |
|
| 0 | 386,6 |
|
| 0 | 386,7 |
|
| 0 | 386,8 |
|
| 0 | 386,9 |
|
+-------+--------+
|
276 rows in set (0.09 sec)
|
This allows placing the boundary in between patches 375 and 376:
- NCSA: 376 - 386 (132 unique patches)
- IN2P3: 364 - 375 (144 unique patches)
This corresponds to RA angle of 7.3913 degrees.
Merging DeepCoadd
The merge will be done in a separate database: lsst_sdss_stripe82_01
Database location: /data3 (available space 3.2 TB)
Space required: < 1 GB
Preparing the output database
Creating the database in the default location (/data) and moving it to a different file system (/data3):
CREATE DATABASE lsst_sdss_stripe82_01; |
|
CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_merge |
LIKE lsst_sdss_stripe82.DeepCoadd_merge; |
|
CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge |
LIKE lsst_sdss_stripe82.DeepCoadd_Metadata_merge; |
|
CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge |
LIKE lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge; |
|
USE lsst_sdss_stripe82_01; |
|
SHOW TABLES;
|
+---------------------------------+ |
| Tables_in_lsst_sdss_stripe82_01 |
|
+---------------------------------+ |
| DeepCoadd_Metadata_merge |
|
| DeepCoadd_To_Htm10_merge |
|
| DeepCoadd_merge |
|
+---------------------------------+ |
Moving the database folder:
/bin/sudo -u mysql mv /data/mysql/db/lsst_sdss_stripe82_01 /data3/mysql/db/ |
/bin/sudo ln -s /data3/mysql/db/lsst_sdss_stripe82_01 /data/mysql/db/ |
Testing results by connecting to the MySQL server:
USE lsst_sdss_stripe82_01; |
|
SHOW TABLES;
|
+---------------------------------+ |
| Tables_in_lsst_sdss_stripe82_01 |
|
+---------------------------------+ |
| DeepCoadd_Metadata_merge |
|
| DeepCoadd_To_Htm10_merge |
|
| DeepCoadd_merge |
|
+---------------------------------+ |
3 rows in set (0.00 sec) |
Copying data from the NCSA region
This operation would copy all NCSA-specific rows excluding those ones which correspond to the earlier identified 144 unique patches on the IN2P3 side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables).
Collecting NCSA-specific deepCoaddId:
SELECT COUNT(deepCoaddId) FROM gapon_SDRP_Stripe82.DeepCoadd; |
+--------------------+ |
| COUNT(deepCoaddId) | |
+--------------------+ |
| 13735 |
|
+--------------------+ |
CREATE TABLE lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId |
AS SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch) NOT IN |
(SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) |
AND patch < "376,0" |
);
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 13015 |
|
+----------+ |
CREATE INDEX IDX_deepCoaddId |
ON lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId (deepCoaddId); |
Note the DIFFERENCE in the number of rows: 720. This corresponds to 144 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to IN2P3.
Copy relevant rows from the input table DeepCoadd:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge |
SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); |
Query OK, 13015 rows affected (0.10 sec) |
Records: 13015 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 13015 |
|
+----------+ |
Copy relevant rows from the input table DeepCoadd_Metadata:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge |
SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_Metadata |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); |
Query OK, 2658664 rows affected (3 min 7.83 sec) |
Records: 2658664 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 2658664 |
|
+----------+ |
Copy relevant rows from the input table DeepCoadd_To_Htm10:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge |
SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_To_Htm10 |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); |
Query OK, 309090 rows affected (0.63 sec) |
Records: 309090 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 309090 |
|
+----------+ |
Copying data from the IN2P3 region
This operation would copy all IN2P3-specific rows excluding those ones which correspond to the earlier identified 132 unique patches on the NCSA side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables).
Collecting IN2P3-specific deepCoaddId:
SELECT COUNT(deepCoaddId) FROM lsst_prod_DC_2013_2.DeepCoadd; |
+--------------------+ |
| COUNT(deepCoaddId) | |
+--------------------+ |
| 13680 |
|
+--------------------+ |
1 row in set (0.00 sec) |
CREATE TABLE lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId |
AS SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd |
WHERE (tract,patch) NOT IN |
(SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) |
AND patch >= "376,0" |
);
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 13020 |
|
+----------+ |
1 row in set (0.00 sec) |
CREATE INDEX IDX_deepCoaddId |
ON lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId (deepCoaddId); |
Note the DIFFERENCE in the number of rows: 660. This corresponds to 132 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to NCSA.
Copy relevant rows from the input table DeepCoadd:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge |
SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); |
Query OK, 13020 rows affected (0.26 sec) |
Records: 13020 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 26035 |
|
+----------+ |
Copy relevant rows from the input table DeepCoadd_Metadata:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge |
SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_Metadata |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); |
Query OK, 2884680 rows affected (1 min 44.95 sec) |
Records: 2884680 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 5543344 |
|
+----------+ |
Copy relevant rows from the input table DeepCoadd_To_Htm10:
INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge |
SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_To_Htm10 |
WHERE deepCoaddId IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); |
Query OK, 308610 rows affected (1.55 sec) |
Records: 308610 Duplicates: 0 Warnings: 0
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 617700 |
|
+----------+ |
Comparing totals with the previous merge
Resulting tables from the previous merge are found in database: lsst_sdss_stripe82
Table DeepCoadd:
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 26035 |
|
+----------+ |
|
SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 26035 |
|
+----------+ |
Table DeepCoadd_Metadata:
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 5543344 |
|
+----------+ |
|
SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_Metadata_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 5543347 |
|
+----------+ |
ATTENTION: note a difference in 3 extra rows present in the previously merged table. This may need to be further investigated.
Table DeepCoadd_To_Htm10:
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 617700 |
|
+----------+ |
|
SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 617700 |
|
+----------+ |
Merging RunDeepSource (similar to LSST Object)
The merge will be done within the same database where table DeepCoadd was merged earlier: lsst_sdss_stripe82_01
Database location: /data3 (available space 3.2 TB)
Space required: < ~100+ GB (see next subsection for more details)
Estimating space requirements
NCSA:
SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]' |
FROM information_schema.tables |
WHERE TABLE_SCHEMA="gapon_SDRP_Stripe82" |
AND TABLE_NAME="RunDeepSource"; |
+------------+------------------+ |
| TABLE_ROWS | DATA_LENGTH [GB] |
|
+------------+------------------+ |
| 98577782 | 51.2 |
|
+------------+------------------+ |
IN2P3:
SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]' |
FROM information_schema.tables |
WHERE TABLE_SCHEMA="lsst_prod_DC_2013_2" |
AND TABLE_NAME="RunDeepSource"; |
+------------+------------------+ |
| TABLE_ROWS | DATA_LENGTH [GB] |
|
+------------+------------------+ |
| 98088148 | 51.0 |
|
+------------+------------------+ |
Preparing the output table
ATENTION: all (but the PRIMARY) keys are disable on the output table to avoid slowing the insertion down because of the index rebuilds.
CREATE TABLE lsst_sdss_stripe82_01.RunDeepSource_merge |
LIKE gapon_SDRP_Stripe82.RunDeepSource; |
|
ALTER TABLE lsst_sdss_stripe82_01.RunDeepSource_merge DISABLE KEYS; |
SHOW INDEX FROM lsst_sdss_stripe82_01.RunDeepSource_merge; |
+---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ |
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | |
+---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ |
| RunDeepSource_merge | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | |
| RunDeepSource_merge | 1 | IDX_coord_htmId20 | 1 | coord_htmId20 | A | NULL | NULL | NULL | YES | BTREE | disabled | | |
| RunDeepSource_merge | 1 | IDX_coord_decl | 1 | coord_decl | A | NULL | NULL | NULL | YES | BTREE | disabled | | |
| RunDeepSource_merge | 1 | IDX_parent | 1 | parent | A | NULL | NULL | NULL | YES | BTREE | disabled | | |
| RunDeepSource_merge | 1 | IDX_coadd_id | 1 | coadd_id | A | NULL | NULL | NULL | | BTREE | disabled | | |
+---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ |
Copying rows from the NCSA region
ATTENTION: this operation is performed in the batch mode as it may be quite lengthy.
This is the query to be executed:
SELECT NOW(); |
INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge |
SELECT * FROM gapon_SDRP_Stripe82.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); |
SELECT NOW(); |
The query is placed into a file at:
/home/gapon/ncsa_RunDeepSource_merge.sql
|
Launching the query:
% cd /home/gapon/ |
% nohup cat ncsa_RunDeepSource_merge.sql | mysql -u root >& ncsa_RunDeepSource_merge.log& |
Checking the log file:
2017-05-10 01:50:30
|
2017-05-10 02:20:04
|
and results within the database:
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 93285875 |
|
+----------+ |
OBSERVATION: this number is smaller than the input number of rows. This is perfectly expected because we've excluded objects from the opposite part of the catalog in the overlap area.
Copying rows from the IN2P3 region
This is the query to be executed:
SELECT NOW(); |
INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge |
SELECT * FROM lsst_prod_DC_2013_2.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); |
SELECT NOW(); |
The query is placed into a file at:
/home/gapon/in2p3_RunDeepSource_merge.sql
|
Launching the query:
% cd /home/gapon/ |
% nohup cat in2p3_RunDeepSource_merge.sql | mysql -u root >& in2p3_RunDeepSource_merge.log& |
Checking the log file:
2017-05-10 02:27:09
|
2017-05-10 02:52:47
|
and results within the database:
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge; |
+-----------+ |
| COUNT(*) | |
+-----------+ |
| 186671762 |
|
+-----------+ |
|
SELECT 186671762 - 93285875; |
+----------------------+ |
| 186671762 - 93285875 |
|
+----------------------+ |
| 93385887 |
|
+----------------------+ |
OBSERVATION: this number is smaller than the total sum of input number of rows. And the second number is smaller the the number of rows in the input table. This is perfectly expected because we've excluded objects from opposite parts of the catalog in the overlap area.
Extract NCSA-specific object identifiers
We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from NCSA:
CREATE TABLE lsst_sdss_stripe82_01.ncsa_RunDeepSource_id |
SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); |
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 93285875 |
|
+----------+ |
OBSERVATION: This number matches the number of entries copied earlier from the same region.
Also create a PK index on the table which will allow optimized query processing when merging forced sources:
CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.ncsa_RunDeepSource_id (id); |
Query OK, 0 rows affected (5 min 38.34 sec) |
Records: 0 Duplicates: 0 Warnings: 0
|
|
SHOW INDEX FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id; |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
| ncsa_RunDeepSource_id | 0 | IDX_id | 1 | id | A | 93286489 | NULL | NULL | | BTREE | | | |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
Also create a list of NCSA-specific objects in the overlap area:
CREATE TABLE lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id |
SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) |
AND patch >= "376,0" |
));
|
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 4699622 |
|
+----------+ |
1 row in set (1.95 sec) |
CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id (id); |
Query OK, 0 rows affected (17.34 sec) |
Records: 0 Duplicates: 0 Warnings: 0
|
Extract IN2P3-specific object identifiers
We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from IN2P3:
CREATE TABLE lsst_sdss_stripe82_01.in2p3_RunDeepSource_id |
SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); |
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 93385887 |
|
+----------+ |
OBSERVATION: This number matches the number of entries copied earlier from the same region.
CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.in2p3_RunDeepSource_id (id); |
Query OK, 0 rows affected (6 min 12.75 sec) |
Records: 0 Duplicates: 0 Warnings: 0
|
|
SHOW INDEX FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id; |
+------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | |
+------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
| in2p3_RunDeepSource_id | 0 | IDX_id | 1 | id | A | 93386365 | NULL | NULL | | BTREE | | | |
+------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |
Also create a list of IN2P3-specific objects in the overlap area:
CREATE TABLE lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id |
SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource |
WHERE coadd_id IN |
(SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT DISTINCT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd |
WHERE (tract,patch) IN |
(SELECT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd) |
AND patch < "376,0" |
));
|
PROBLEM: the previous query got stuck for many hours.
SELECT COUNT(*) FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 5291112 |
|
+----------+ |
1 row in set (1.43 sec) |
CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id (id); |
Query OK, 0 rows affected (19.27 sec) |
Records: 0 Duplicates: 0 Warnings: 0
|
De-duplicating RunDeepForcedSource (similar to LSST ForcedSource)
Evaluation steps:
- study the procedure for filtering forced sources which was devised last time.
- prepare a seep-by-step plan for processing these tables.
- visit the status of the input tables (RunDeepForcedSource) to see which are available and if they’re healthy.
- eliminate tables left from the previous effort to free up the space
- evaluate space requirements and space availability (note each instance of the table will have two output tables: _dedup and _radecl
Clearing unneeded or redundant tables (NCSA)
This step allows to get extra space for merging the tables.
USE daues_SDRP_dedupe_byfilter_4; |
SHOW TABLES;
|
+----------------------------------------+ |
| Tables_in_daues_SDRP_dedupe_byfilter_4 |
|
+----------------------------------------+ |
| RunDeepForcedSource |
|
| RunDeepForcedSource_radecl |
|
+----------------------------------------+ |
|
SELECT COUNT(*) FROM RunDeepForcedSource_radecl; |
+------------+ |
| COUNT(*) | |
+------------+ |
| 1751879670 |
|
+------------+ |
|
SELECT COUNT(*) FROM RunDeepForcedSource; |
+------------+ |
| COUNT(*) | |
+------------+ |
| 1751879670 |
|
+------------+ |
OBSERVATION: The extended table _radecl has two extra columns needed by the partitioning tool. Otherwise this is the same original (un-modified) table because the way the previous merge attempt was done by giving a preference to the forced sources from NCSA witin the overlap area.
So, we can eliminate those original un-extended tables from all 5 databases:
DROP TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; |
DROP TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource; |
DROP TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource; |
DROP TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource; |
DROP TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource; |
Clearing unneeded or redundant tables (IN2P3)
This step affects the followingdatabases:
lsst_prod_dedupe_byfilter_u
|
lsst_prod_dedupe_byfilter_g
|
lsst_prod_dedupe_byfilter_r
|
lsst_prod_dedupe_byfilter_i
|
lsst_prod_dedupe_byfilter_z
|
Inspecting tables in one of those to which what should be left:
USE lsst_prod_dedupe_byfilter_u; |
SHOW TABLES;
|
+---------------------------------------+ |
| Tables_in_lsst_prod_dedupe_byfilter_u |
|
+---------------------------------------+ |
| RunDeepForcedSource |
|
| RunDeepForcedSource_dedup |
|
| RunDeepForcedSource_dedup_radecl |
|
+---------------------------------------+ |
SELECT COUNT(*) FROM RunDeepForcedSource; |
+------------+ |
| COUNT(*) | |
+------------+ |
| 1897928494 |
|
+------------+ |
SELECT COUNT(*) FROM RunDeepForcedSource_dedup; |
+------------+ |
| COUNT(*) | |
+------------+ |
| 1720938745 |
|
+------------+ |
SELECT COUNT(*) FROM RunDeepForcedSource_dedup_radecl; |
+------------+ |
| COUNT(*) | |
+------------+ |
| 1720938745 |
|
+------------+ |
OBSERVATION: The original table is the one which should be used for the merging effort and eliminate those duplicated and extended tables from all 5 databases:
DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup; |
DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup_radecl; |
DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup; |
DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup_radecl; |
DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup; |
DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup_radecl; |
DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup; |
DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup_radecl; |
DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup; |
DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup_radecl; |
Also eliminate 4 temporary tables:
DROP TABLE lsst_prod_dedupe_byfilter_g.tmp_dedupe_g; |
DROP TABLE lsst_prod_dedupe_byfilter_r.tmp_dedupe_r; |
DROP TABLE lsst_prod_dedupe_byfilter_i.tmp_dedupe_i; |
DROP TABLE lsst_prod_dedupe_byfilter_z.tmp_dedupe_z; |
De-duplicating NCSA tables
GENERAL ALGORITHM: for every (5 in total) input table and the corresponding (same schema as the input one) output one copy all rows which are NOT linked to the previously identified objects from the IN2P3's part of the overlap. The identifiers of those objects are found in table:
lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id
|
This operation was launched for all 5 tables of 5 databases located at:
% ls -al /data/mysql/db/ |
drwx------ 2 mysql mysql 4096 May 11 18:20 daues_SDRP_dedupe_byfilter_0
|
drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_1
|
drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_2
|
drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_3
|
drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_4
|
Before the processing began the file system had more than enough of the free space to accommodate the new tables (3.5 TB is required):
% df -h /data/mysql/db/ |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vdb 12T 5.0T 6.1T 45% /data |
Creating the output tables:
CREATE TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
|
CREATE TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup |
LIKE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl; |
|
CREATE TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup |
LIKE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl; |
|
CREATE TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup |
LIKE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl; |
|
CREATE TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup |
LIKE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl; |
Launching the following 5 commands IN PARALLEL in the batch mode:
INSERT INTO daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup |
SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); |
INSERT INTO daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup |
SELECT * FROM daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); |
INSERT INTO daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup |
SELECT * FROM daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); |
INSERT INTO daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup |
SELECT * FROM daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); |
INSERT INTO daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup |
SELECT * FROM daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); |
The log files for the operations are found at:
% ls -al /home/gapon/logs/ |
-rw-rw-r-- 1 gapon gapon 0 May 11 18:28 daues_SDRP_dedupe_byfilter_0.log
|
-rw-rw-r-- 1 gapon gapon 0 May 11 18:33 daues_SDRP_dedupe_byfilter_1.log
|
-rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_2.log
|
-rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_3.log
|
-rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_4.log
|
When the operation finished:
% df -h /data/mysql/db/ |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vdb 12T 8.1T 3.1T 73% /data |
Running a quick test for each table:
SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH |
FROM information_schema.tables |
WHERE TABLE_SCHEMA LIKE "daues_SDRP_dedupe_byfilter\__"; |
+------------------------------+----------------------------------+------------+--------------+ |
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH |
|
+------------------------------+----------------------------------+------------+--------------+ |
| daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl | 1729599791 | 710865514101 |
|
| daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl_dedup | 1634106268 | 671617676148 |
|
| daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl | 1752805399 | 720403018989 |
|
| daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl_dedup | 1654729991 | 680094026301 |
|
| daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl | 1754290686 | 721013471946 |
|
| daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl_dedup | 1656203271 | 680699544381 |
|
| daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl | 1754272841 | 721006137651 |
|
| daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl_dedup | 1656209922 | 680702277942 |
|
| daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl | 1751879670 | 720022544370 |
|
| daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl_dedup | 1654034432 | 679808151552 |
|
+------------------------------+----------------------------------+------------+--------------+ |
Extending IN2P3 tables
IMPORTANT: A goal of tis step is to add two extra columns which have spherical coordinates of the corresponding objects. This will be needed to correctly partition forced sources (for a variety of technical reasons, the current implementation of the partitioning algorithm is NOT using the explicit association between forced-sources and the corresponding objects and is relying upon an implicit spatial association).
This operation was launched for 5 tables of 5 databases located at:
% ls -al /data/mysql/db/ |
lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_g -> /data2/mysql/db/lsst_prod_dedupe_byfilter_g |
lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_i -> /data2/mysql/db/lsst_prod_dedupe_byfilter_i |
lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_r -> /data2/mysql/db/lsst_prod_dedupe_byfilter_r |
lrwxrwxrwx 1 root root 43 May 12 06:08 lsst_prod_dedupe_byfilter_u -> /data2/mysql/db/lsst_prod_dedupe_byfilter_u |
lrwxrwxrwx 1 root root 43 May 12 20:54 lsst_prod_dedupe_byfilter_z -> /data2/mysql/db/lsst_prod_dedupe_byfilter_z |
The file system has more than enough of the free space to accommodate the new tables (3.5 TB is required):
% df -h /data2/mysql/db/ |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vdd 7.8T 3.7T 3.8T 49% /data2 |
Creating the output tables:
CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl |
LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; |
The GENERAL ALGORITHM relies upon a small table produced earlier in a context of the original effort. This table is a subset of the full RunDeepSource table which only has a triplet of (id,ra,decl) for each object.
SELECT COUNT(*) FROM lsst_prod_DC_2013_2.RunDeepSource_radecl; |
+----------+ |
| COUNT(*) | |
+----------+ |
| 98088148 |
|
+----------+ |
The following operation is in progress for all 5 filter bands:
INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl |
SELECT rdfs.*,rdss.coord_ra,rdss.coord_decl |
FROM |
lsst_prod_dedupe_byfilter_u.RunDeepForcedSource `rdfs`
|
STRAIGHT_JOIN
|
lsst_prod_DC_2013_2.RunDeepSource_radecl `rdss`
|
ON |
rdfs.objectId = rdss.id;
|
The log files for the operations are found at:
% ls -al /home/gapon/logs/ |
-rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_g_radecl.log
|
-rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_i_radecl.log
|
-rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_r_radecl.log
|
-rw-rw-r-- 1 gapon gapon 0 May 12 21:43 lsst_prod_dedupe_byfilter_u_radecl.log
|
-rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_z_radecl.log
|
Tested results by:
SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH |
FROM information_schema.tables |
WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__" |
AND TABLE_NAME LIKE "RunDeepForcedSource%"; |
+-----------------------------+----------------------------+------------+--------------+ |
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH |
|
+-----------------------------+----------------------------+------------+--------------+ |
| lsst_prod_dedupe_byfilter_g | RunDeepForcedSource | 1947408074 | 769226189230 |
|
| lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl | 1947408074 | 800384718414 |
|
| lsst_prod_dedupe_byfilter_i | RunDeepForcedSource | 1950338024 | 770383519480 |
|
| lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl | 1950338024 | 801588927864 |
|
| lsst_prod_dedupe_byfilter_r | RunDeepForcedSource | 1950357798 | 770391330210 |
|
| lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl | 1950357798 | 801597054978 |
|
| lsst_prod_dedupe_byfilter_u | RunDeepForcedSource | 1897928494 | 749681755130 |
|
| lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl | 1897928494 | 780048611034 |
|
| lsst_prod_dedupe_byfilter_z | RunDeepForcedSource | 1946839240 | 769001499800 |
|
| lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl | 1946839240 | 800150927640 |
|
+-----------------------------+----------------------------+------------+--------------+ |
Eliminating the original (non-extended) tables to free space for the next set of (de-duplicated) tables:
DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource; |
DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource; |
DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource; |
DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource; |
DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource; |
Checking the free space:
% df -h /data2 |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vdd 7.8T 3.7T 3.8T 50% /data2 |
This should be enough considering ~5% reduction of rows in the number of rows due to the de-duplication. |
De-duplicating IN2P3 tables
GENERAL ALGORITHM: for every (5 in total) input table and the corresponding (same schema as the input one) output table copy all rows which are NOT linked to the previously identified objects from the NCSA's part of the overlap. The identifiers of those objects (to be excluded from being copied) are found in table:
lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id
|
Creating output tables:
CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup |
LIKE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup |
LIKE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup |
LIKE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup |
LIKE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl; |
|
CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup |
LIKE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl; |
Launching the following 5 commands IN PARALLEL in the batch mode:
INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup |
SELECT * FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); |
INSERT INTO lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup |
SELECT * FROM lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); |
INSERT INTO lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup |
SELECT * FROM lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); |
INSERT INTO lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup |
SELECT * FROM lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); |
INSERT INTO lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup |
SELECT * FROM lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl |
WHERE objectId NOT IN |
(SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); |
Tested results by:
SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH |
FROM information_schema.tables |
WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__" |
AND TABLE_NAME LIKE "RunDeepForcedSource%"; |
+-----------------------------+----------------------------------+------------+--------------+ |
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH |
|
+-----------------------------+----------------------------------+------------+--------------+ |
| lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl | 1947408074 | 800384718414 |
|
| lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl_dedup | 1865204563 | 766599075393 |
|
| lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl | 1950338024 | 801588927864 |
|
| lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl_dedup | 1867918914 | 767714673654 |
|
| lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl | 1950357798 | 801597054978 |
|
| lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl_dedup | 1867935694 | 767721570234 |
|
| lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl | 1897928494 | 780048611034 |
|
| lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl_dedup | 1817409697 | 746955385467 |
|
| lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl | 1946839240 | 800150927640 |
|
| lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl_dedup | 1864588704 | 766345957344 |
|
+-----------------------------+----------------------------------+------------+--------------+ |
Dumping databases into the TSV format
DeepCoadd (and 2 deendent tables), RunDeepSource
Prepare the destination folder:
{code;bash}% sudo mkdir -p /data1/tsv/lsst_sdss_stripe82_01
% sudo chown mysql:mysql /data1/tsv/lsst_sdss_stripe82_01
% sudo chmod 0777 /data1/tsv/lsst_sdss_stripe82_01
Started the dump with:
|
{code:bash}
|
% nohup mysqldump -u root lsst_sdss_stripe82_01 \
|
DeepCoadd_merge \
|
DeepCoadd_Metadata_merge \
|
DeepCoadd_To_Htm10_merge \
|
RunDeepSource_merge \
|
-T/data1/tsv/lsst_sdss_stripe82_01/ &>logs/lsst_sdss_stripe82_01.tsv.log &
|
The operation successfully finished. NO problems in the log file. The file sizes are shown below:
% ls -al /data1/tsv/lsst_sdss_stripe82_01/ |
total 211984396
|
drwxrwxrwx 2 mysql mysql 4096 May 16 21:07 .
|
drwxrwxrwx 9 mysql mysql 4096 May 16 02:24 ..
|
-rw-rw-r-- 1 gapon gapon 2549 May 16 02:29 DeepCoadd_merge.sql
|
-rw-rw-rw- 1 mysql mysql 12319362 May 16 02:29 DeepCoadd_merge.txt
|
-rw-rw-r-- 1 gapon gapon 1669 May 16 02:29 DeepCoadd_Metadata_merge.sql
|
-rw-rw-rw- 1 mysql mysql 279986169 May 16 02:29 DeepCoadd_Metadata_merge.txt
|
-rw-rw-r-- 1 gapon gapon 1495 May 16 02:29 DeepCoadd_To_Htm10_merge.sql
|
-rw-rw-rw- 1 mysql mysql 11462430 May 16 02:29 DeepCoadd_To_Htm10_merge.txt
|
-rw-rw-r-- 1 gapon gapon 8189 May 16 02:29 RunDeepSource_merge.sql
|
-rw-rw-rw- 1 mysql mysql 216768195814 May 16 07:05 RunDeepSource_merge.txt
|
RunDeepForcedSource of NCSA
Based on the prior effort each TSV file would have an approximate size of 1.2 TB. Hence the storage requirement for all 5 tables would be 6 TB. The following file system should provide enough capacity to accommodate these files:
% df -h /data1 |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vdc 7.8T 93M 7.4T 1% /data1 |
|
% for band in "0" "1" "2" "3" "4"; do \ |
sudo mkdir -p /data1/tsv/daues_SDRP_dedupe_byfilter_${band} \ |
done
|
|
% sudo chown -R mysql:mysql /data1/tsv |
% sudo chmod -R 0777 /data1 |
Starting all dumps in the batch mode:
% for band in "0" "1" "2" "3" "4"; do \ |
nohup mysqldump -u root \ |
daues_SDRP_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \
|
-T/data1/tsv/daues_SDRP_dedupe_byfilter_${band}/ \ |
>& /home/gapon/logs/daues_SDRP_dedupe_byfilter_${band}.tsv.log & \ |
done |
The operation finished with the following file sizes:
ls -al /data1/tsv/daues_SDRP_dedupe_byfilter_*/*.txt |
-rw-rw-rw- 1 mysql mysql 1161290588478 May 14 16:19 /data1/tsv/daues_SDRP_dedupe_byfilter_0/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1207653978772 May 14 17:37 /data1/tsv/daues_SDRP_dedupe_byfilter_1/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1220211612878 May 14 17:51 /data1/tsv/daues_SDRP_dedupe_byfilter_2/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1227132706017 May 14 17:59 /data1/tsv/daues_SDRP_dedupe_byfilter_3/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1198155639140 May 14 18:02 /data1/tsv/daues_SDRP_dedupe_byfilter_4/RunDeepForcedSource_radecl_dedup.txt |
RunDeepForcedSource of IN2P3
Based on the prior effort each TSV file would have an approximate size of 1.2 TB. Hence the storage requirement for all 5 tables would be 6 TB. The following file system should provide enough capacity to accommodate these files:
% df -h /data3/tsv |
Filesystem Size Used Avail Use% Mounted on
|
/dev/vde 7.8T 201G 7.2T 3% /data3 |
|
% for band in "u" "g" "r" "i" "z"; do \ |
sudo mkdir -p /data3/tsv/lsst_prod_dedupe_byfilter_${band} \ |
done
|
|
% sudo chown -R mysql:mysql /data3/tsv |
% sudo chmod -R 0777 /data3/tsv |
Starting all dumps in the batch mode:
% for band in "u" "g" "r" "i" "z"; do \ |
nohup mysqldump -u root \ |
lsst_prod_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \
|
-T/data3/tsv/lsst_prod_dedupe_byfilter_${band}/ \ |
>& /home/gapon/logs/lsst_prod_dedupe_byfilter_${band}.tsv.log & \ |
done |
The operation finished with the following file sizes:
% ls -al /data3/tsv/lsst_prod_dedupe_byfilter_*/*.txt |
-rw-rw-rw- 1 mysql mysql 1356553484909 May 18 05:49 /data3/tsv/lsst_prod_dedupe_byfilter_g/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1375324194874 May 18 06:07 /data3/tsv/lsst_prod_dedupe_byfilter_i/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1369286130161 May 18 06:00 /data3/tsv/lsst_prod_dedupe_byfilter_r/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1287603053848 May 18 04:36 /data3/tsv/lsst_prod_dedupe_byfilter_u/RunDeepForcedSource_radecl_dedup.txt |
-rw-rw-rw- 1 mysql mysql 1341697675866 May 18 05:35 /data3/tsv/lsst_prod_dedupe_byfilter_z/RunDeepForcedSource_radecl_dedup.txt |
Migrating TSV files from OpenStack to lsst-dev
Set up SSH configuration to simplify file transfers:
% cat cat ~/.ssh/config |
Host 141.142.211.152
|
IdentityFile ~/.ssh/id_ncsa_passwordless_rsa |
The destination folder for the files will be:
/datasets/gapon/catalogs_01/
|
All transfers will be done from lsst-xfer using rsync.
RunDeepForcedSource of NCSA
The destination folders:
% ls -alh /datasets/gapon/catalogs_01/
|
total 0
|
drwxr-xr-x 7 gapon grp_202 4.0K May 14 19:05 .
|
drwxr-xr-x 15 gapon root 4.0K May 14 18:40 ..
|
drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:03 daues_SDRP_dedupe_byfilter_0
|
drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_1
|
drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_2
|
drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_3
|
drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:05 daues_SDRP_dedupe_byfilter_4
|
Launched 5 rsync on node lsst-xfer:
% cd /datasets/gapon/catalogs_01/ |
% for band in "0" "1" "2" "3" "4"; do \ |
nohup rsync -hrv \ |
141.142.211.152:/data1/tsv/daues_SDRP_dedupe_byfilter_${band} . \ |
>& daues_SDRP_dedupe_byfilter_${band}.log& \
|
done |
NOTE: the log files were put onto the destination folders
The operation has finished. Checking file sizes:
% ls -al /datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_* | grep txt |
-rw-r--r-- 1 gapon grp_202 1161290588478 May 15 21:18 RunDeepForcedSource_radecl_dedup.txt
|
-rw-r--r-- 1 gapon grp_202 1207653978772 May 15 20:32 RunDeepForcedSource_radecl_dedup.txt
|
-rw-r--r-- 1 gapon grp_202 1220211612878 May 15 20:36 RunDeepForcedSource_radecl_dedup.txt
|
-rw-r--r-- 1 gapon grp_202 1227132706017 May 15 21:04 RunDeepForcedSource_radecl_dedup.txt
|
-rw-r--r-- 1 gapon grp_202 1198155639140 May 15 20:41 RunDeepForcedSource_radecl_dedup.txt
|
RunDeepForcedSource of IN2P3
The destination folder for the files is:
% ls -alh /datasets/gapon/catalogs_01/
|
Launched 5 rsync on node lsst-xfer:
% cd /datasets/gapon/catalogs_01/ |
% for band in "u" "g" "r" "i" "z"; do \ |
nohup rsync -hrv \ |
141.142.211.152:/data3/tsv/lsst_prod_dedupe_byfilter_${band} . \ |
>& lsst_prod_dedupe_byfilter_${band}.log& \
|
done |
NOTE: the log files were put onto the destination folders
This operation is IN-PROGRESS...
The operation has finished. Checking file sizes:
% ls -al /datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_* | grep txt |
DeepCoadd (and 2 metadata tables) and RunDeepSource
Transferred 4 tables to the destination folder:
% cd /datasets/gapon/catalogs_01 |
% nohup rsync -hrv \ |
141.142.211.152:/data1/tsv/lsst_sdss_stripe82_01 . \ |
>& lsst_sdss_stripe82_01.log
|
% lls -al /datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/ |
total 211984768
|
drwxr-xr-x 2 gapon grp_202 4096 May 16 16:43 .
|
drwxr-xr-x 8 gapon grp_202 4096 May 16 16:08 ..
|
-rw-r--r-- 1 gapon grp_202 2549 May 16 16:08 DeepCoadd_merge.sql
|
-rw-r--r-- 1 gapon grp_202 12319362 May 16 16:08 DeepCoadd_merge.txt
|
-rw-r--r-- 1 gapon grp_202 1669 May 16 16:08 DeepCoadd_Metadata_merge.sql
|
-rw-r--r-- 1 gapon grp_202 279986169 May 16 16:08 DeepCoadd_Metadata_merge.txt
|
-rw-r--r-- 1 gapon grp_202 1495 May 16 16:08 DeepCoadd_To_Htm10_merge.sql
|
-rw-r--r-- 1 gapon grp_202 11462430 May 16 16:08 DeepCoadd_To_Htm10_merge.txt
|
-rw-r--r-- 1 gapon grp_202 8189 May 16 16:08 RunDeepSource_merge.sql
|
-rw-r--r-- 1 gapon grp_202 216768195814 May 16 16:43 RunDeepSource_merge.txt
|
Partitioning RunDeepSource
Created a work area where the resulting chunks will be located. The same folder will also host the configuration and log files:
/datasets/gapon/catalogs_01_partitioned
|
The table-specific partitions will e placed at subfolder:
RunDeepSource/
|
The most important configuration parameters are shown below:
% cat RunDeepSource.cfg |
..
|
# Partitioning parameters.
|
part = {
|
# Common partitioning parameters. |
num-stripes = 340
|
num-sub-stripes = 12
|
chunk = chunkId
|
sub-chunk = subChunkId
|
|
# The partitioning position is the object's point-source model position. |
pos = 'coord_ra, coord_decl' |
# Overlap radius in degrees. |
overlap = 0.01667
|
}
|
..
|
IMPORTANT: these are the same parameters as for the presently deployed (in PDAC) database sdss_stripe82_00.
Launched the partitioning tool on lsst-dev:
% cd /datasets/gapon/catalogs_01_partitioned |
% source /datasets/gapon/stack/loadLSST.bash |
% setup -t qserv-dev qserv_distrib
|
% which sph-partition |
/datasets/gapon/stack/Linux64/partition/12.0+9/bin/sph-partition |
% nohup sph-partition \ |
--verbose -c RunDeepSource.cfg \
|
--out.dir=RunDeepSource/ \ |
--in=/datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/RunDeepSource_merge.txt \ |
>& logs/RunDeepSource.log& |
When the operation finished the output folder had:
% du -hs RunDeepSource/ |
629G RunDeepSource/
|
% ls -al RunDeepSource/*.txt | wc -l |
2160
|
% ls -al RunDeepSource/*_overlap.txt | wc -l |
1080
|
CONCLUSION: this is perfectly consistent with the presently loaded initial version of the database.
Partitioning RunDeepForcedSource
Using the same work area for the resulting chunks as in case of table RunDeepSource:
/datasets/gapon/catalogs_01_partitioned
|
The table-specific partitions will be placed at subfolders:
RunDeepForcedSource/<database>
|
Where the <database> parameter represents 10 original databases:
daues_SDRP_dedupe_byfilter_0
|
daues_SDRP_dedupe_byfilter_1
|
daues_SDRP_dedupe_byfilter_2
|
daues_SDRP_dedupe_byfilter_3
|
daues_SDRP_dedupe_byfilter_4
|
lsst_prod_dedupe_byfilter_u
|
lsst_prod_dedupe_byfilter_g
|
lsst_prod_dedupe_byfilter_r
|
lsst_prod_dedupe_byfilter_i
|
lsst_prod_dedupe_byfilter_z
|
The most important configuration parameters are shown below:
% cat RunDeepForcedSource.cfg |
..
|
# Partitioning parameters.
|
part = {
|
# Common partitioning parameters. |
num-stripes = 340
|
num-sub-stripes = 12
|
chunk = chunkId
|
sub-chunk = subChunkId
|
|
# The partitioning position is the object's point-source model position. |
pos = 'object_coord_ra, object_coord_decl' |
# Overlap radius in degrees. |
overlap = 0
|
}
|
..
|
IMPORTANT: these are the same parameters as for the presently deployed (in PDAC) database sdss_stripe82_00.
Launching the same partitioning tool on lsst-dev:
% cd /datasets/gapon/catalogs_01_partitioned |
% source /datasets/gapon/stack/loadLSST.bash |
% setup -t qserv-dev qserv_distrib
|
% which sph-partition |
/datasets/gapon/stack/Linux64/partition/12.0+9/bin/sph-partition |
NCSA
Note that each database will have its own collection of (partial) chunks:
for band in "0" "1" "2" "3" "4"; do \ |
mkdir -p RunDeepForcedSource/daues_SDRP_dedupe_byfilter_${band}; \ |
nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg \ |
--out.dir=RunDeepForcedSource/daues_SDRP_dedupe_byfilter_${band}/ \ |
--in=/datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_${band}/RunDeepForcedSource_radecl_dedup.txt \ |
>& logs/daues_SDRP_dedupe_byfilter_${band}.log& \ |
done |
When the operation will finish check the output folders:
% du -hs RunDeepForcedSource/* |
1.1T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_0 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_1 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_2 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_3 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_4 |
% ls -1 RunDeepForcedSource/daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l |
548
|
IN2P3
Note that each database will have its own collection of (partial) chunks:
for band in "u" "g" "r" "i" "z"; do \ |
mkdir -p RunDeepForcedSource/lsst_prod_dedupe_byfilter_${band}; \ |
nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg \ |
--out.dir=RunDeepForcedSource/lsst_prod_dedupe_byfilter_${band}/ \ |
--in=/datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_${band}/RunDeepForcedSource_radecl_dedup.txt \ |
>& logs/lsst_prod_dedupe_byfilter_${band}.log& \ |
done |
This operation is IN-PROGRESS...
When the operation will finish check the output folders:
% du -hs RunDeepForcedSource/* |
1.1T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_0 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_1 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_2 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_3 |
1.2T RunDeepForcedSource/daues_SDRP_dedupe_byfilter_4 |
1.3T RunDeepForcedSource/lsst_prod_dedupe_byfilter_g |
1.3T RunDeepForcedSource/lsst_prod_dedupe_byfilter_i |
1.3T RunDeepForcedSource/lsst_prod_dedupe_byfilter_r |
1.2T RunDeepForcedSource/lsst_prod_dedupe_byfilter_u |
1.3T RunDeepForcedSource/lsst_prod_dedupe_byfilter_z |
% ls -1 RunDeepForcedSource/*/ | grep .txt | sort -u | wc -l |
1080
|
% ls -1 RunDeepForcedSource/daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l |
548
|
% ls -1 RunDeepForcedSource/lsst_prod_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l |
555
|
These numbers could also be compared with the previously (incorrectly) merged version of the catalog (where there was teh same number of unique chunks - 1080):
% ls -1 ../catalogs_partitioned/out_daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l |
570
|
% ls -1 ../catalogs_partitioned/out_lsst_prod_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l |
517
|
CONCLUSION: this observation is perfectly consistent with expectations as we should see more equal distribution of partitions between datasets with a slightly higher number of chunks in the IN2P3 area of the new catalog (see the description section of this ticket for more details).
Preparing the input data folders for the loading
This is going to be an area which is formatted according to expectations of the catalog loading tools found in the GitHub package:
The are is prepared in a way which allows the parallel loading of chunks by scripts simultaneously run on all worker nodes. Note that the very same package also hosts tools which are used by the folder preparation steps explained hereafter.
The folder location is:
/datasets/gapon/production/stripe82_01_catalog_load/production_load
|
General set up
Create the catalog and its sub-folders:
% mkdir /datasets/gapon/production/stripe82_01_catalog_load/ |
% cd /datasets/gapon/production/stripe82_01_catalog_load/ |
% mkdir production_load |
Set up the initial map of chunks to workers stored in the previously mentioned GitHub package. This map will be used by the map generator tools:
% ln -s /datasets/gapon/development/db_pdac_stripe82/worker2chunks . |
% wc -l worker2chunks/* |
36 worker2chunks/lsst-qserv-db01 |
36 worker2chunks/lsst-qserv-db02 |
36 worker2chunks/lsst-qserv-db03 |
36 worker2chunks/lsst-qserv-db04 |
36 worker2chunks/lsst-qserv-db05 |
36 worker2chunks/lsst-qserv-db06 |
36 worker2chunks/lsst-qserv-db07 |
36 worker2chunks/lsst-qserv-db08 |
36 worker2chunks/lsst-qserv-db09 |
36 worker2chunks/lsst-qserv-db10 |
36 worker2chunks/lsst-qserv-db11 |
36 worker2chunks/lsst-qserv-db12 |
36 worker2chunks/lsst-qserv-db13 |
36 worker2chunks/lsst-qserv-db14 |
36 worker2chunks/lsst-qserv-db15 |
36 worker2chunks/lsst-qserv-db16 |
36 worker2chunks/lsst-qserv-db17 |
36 worker2chunks/lsst-qserv-db18 |
36 worker2chunks/lsst-qserv-db19 |
36 worker2chunks/lsst-qserv-db20 |
36 worker2chunks/lsst-qserv-db21 |
36 worker2chunks/lsst-qserv-db22 |
36 worker2chunks/lsst-qserv-db23 |
36 worker2chunks/lsst-qserv-db24 |
36 worker2chunks/lsst-qserv-db25 |
36 worker2chunks/lsst-qserv-db26 |
36 worker2chunks/lsst-qserv-db27 |
36 worker2chunks/lsst-qserv-db28 |
36 worker2chunks/lsst-qserv-db29 |
36 worker2chunks/lsst-qserv-db30 |
1080 total
|
RunDeepSource
Create the table-specific folder:
% cd /datasets/gapon/production/stripe82_01_catalog_load/ |
% mkdir production_load/RunDeepSource |
Use the generator tools to populate the folder with the previously partitioned chunks:
% python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepSource_chunk2worker.py mkdir \ |
>& RunDeepSource_chunk2worker_mkdir.source |
% source RunDeepSource_chunk2worker_mkdir.source |
% ls -1 production_load/RunDeepSource/ |
lsst-qserv-db01
|
lsst-qserv-db02
|
lsst-qserv-db03
|
lsst-qserv-db04
|
lsst-qserv-db05
|
lsst-qserv-db06
|
lsst-qserv-db07
|
lsst-qserv-db08
|
lsst-qserv-db09
|
lsst-qserv-db10
|
lsst-qserv-db11
|
lsst-qserv-db12
|
lsst-qserv-db13
|
lsst-qserv-db14
|
lsst-qserv-db15
|
lsst-qserv-db16
|
lsst-qserv-db17
|
lsst-qserv-db18
|
lsst-qserv-db19
|
lsst-qserv-db20
|
lsst-qserv-db21
|
lsst-qserv-db22
|
lsst-qserv-db23
|
lsst-qserv-db24
|
lsst-qserv-db25
|
lsst-qserv-db26
|
lsst-qserv-db27
|
lsst-qserv-db28
|
lsst-qserv-db29
|
lsst-qserv-db30
|
% python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepSource_chunk2worker.py ln \ |
>& RunDeepSource_chunk2worker_mkdir.source |
% source RunDeepSource_chunk2worker_ln.source |
% ls -1 production_load/RunDeepSource/lsst-qserv-db01/ |
chunk_113589_overlap.txt
|
chunk_113589.txt
|
chunk_113619_overlap.txt
|
chunk_113619.txt
|
chunk_113649_overlap.txt
|
..
|
chunk_117638_overlap.txt
|
chunk_117638.txt
|
chunk_index.bin
|
RunDeepForcedSource
Create the table-specific folder:
% cd /datasets/gapon/production/stripe82_01_catalog_load/ |
% mkdir production_load/RunDeepForcedSource |
Use the generator tools to populate the folder with the previously partitioned chunks:
% python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepForcedSource_chunk2worker.py mkdir \ |
>& RunDeepForcedSource_chunk2worker_mkdir.source |
% source RunDeepForcedSource_chunk2worker_mkdir.source |
% ls -1 production_load/RunDeepForcedSource/ |
lsst-qserv-db01
|
lsst-qserv-db02
|
lsst-qserv-db03
|
lsst-qserv-db04
|
lsst-qserv-db05
|
lsst-qserv-db06
|
lsst-qserv-db07
|
lsst-qserv-db08
|
lsst-qserv-db09
|
lsst-qserv-db10
|
lsst-qserv-db11
|
lsst-qserv-db12
|
lsst-qserv-db13
|
lsst-qserv-db14
|
lsst-qserv-db15
|
lsst-qserv-db16
|
lsst-qserv-db17
|
lsst-qserv-db18
|
lsst-qserv-db19
|
lsst-qserv-db20
|
lsst-qserv-db21
|
lsst-qserv-db22
|
lsst-qserv-db23
|
lsst-qserv-db24
|
lsst-qserv-db25
|
lsst-qserv-db26
|
lsst-qserv-db27
|
lsst-qserv-db28
|
lsst-qserv-db29
|
lsst-qserv-db30
|
% python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepForcedSource_chunk2worker.py ln \ |
>& RunDeepForcedSource_chunk2worker_ln.source |
% source RunDeepForcedSource_chunk2worker_ln.source |
% ls -1 production_load/RunDeepForcedSource/lsst-qserv-db01 |
out_daues_SDRP_dedupe_byfilter_0
|
out_daues_SDRP_dedupe_byfilter_1
|
out_daues_SDRP_dedupe_byfilter_2
|
out_daues_SDRP_dedupe_byfilter_3
|
out_daues_SDRP_dedupe_byfilter_4
|
out_lsst_prod_dedupe_byfilter_g
|
out_lsst_prod_dedupe_byfilter_i
|
out_lsst_prod_dedupe_byfilter_r
|
out_lsst_prod_dedupe_byfilter_u
|
out_lsst_prod_dedupe_byfilter_z
|
% ls -1 production_load/RunDeepForcedSource/lsst-qserv-db01/out_daues_SDRP_dedupe_byfilter_0/ |
chunk_114178.txt
|
chunk_114208.txt
|
chunk_114238.txt
|
chunk_114858.txt
|
chunk_114888.txt
|
chunk_114918.txt
|
chunk_115538.txt
|
chunk_115568.txt
|
chunk_115598.txt
|
chunk_116218.txt
|
chunk_116248.txt
|
chunk_116278.txt
|
chunk_116898.txt
|
chunk_116928.txt
|
chunk_116958.txt
|
chunk_117578.txt
|
chunk_117608.txt
|
chunk_117638.txt
|
chunk_index.bin
|
And the total number of partitions matches the expected number:
% ls -1 production_load/RunDeepForcedSource/*/*/ | grep .txt | ort -u | wc -l |
1080
|
Non-partitioned tables
Create the table-specific folder:
% cd /datasets/gapon/production/stripe82_01_catalog_load/ |
% mkdir production_load/non-part |
Verifying a status of the infrastructure services for re-assembling the datataset
The project (of patching the "hole") will be based on a setup which was established earlier in a context of
DM-7053. This includes:The virtual machine is up, and all four external volumes are mounted:
Filesystem Size Used Avail Use% Mounted on
...
Adjustments made to the MySQL configuration
Extended the InnoDB file size to allow more space for intermediate tables and indexes:
{coe:bash}% cat /etc/my.cnf.d/server.cnf
..
innodb_data_file_path = ibdata1:10G;ibdata2:10G;
Then restarted the server:
{code:bash}
% sudo systemctl restart mariadb