Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8241

PDAC: patch "hole" in Stripe 82 catalog data

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Currently, all catalog data for the NCSA/IN2P3 overlap region of the Stripe 82 processing in the PDAC has been taken from the NCSA side of the processing. However, the NCSA side of the processing does not include data for all patches on its "outer" boundary of the overlap region (and same is true for IN2P3 processing with respect to the the opposite overlap boundary).

      The catalog merge will need to be redone, with patch division moved away from either of the overlap region boundaries in order to avoid "holes".

      Based on a layout of patches (see a PDF document attached by this tickets) the new boundary should be laid between 375 and 376.

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            Verifying a status of the infrastructure services for re-assembling the datataset

            The project (of patching the "hole") will be based on a setup which was established earlier in a context of DM-7053. This includes:

            • one OpenStack NEBULA instance lsst-gapon-0 (floating IP: 141.142.211.152)
            • and 4 distributed images: lsst-dm-gapon, lsst-dm-gapon-1, lsst-dm-gapon-2 and lsst-dm-gapon-3

            The virtual machine is up, and all four external volumes are mounted:

            [gapon@lsst-dev01 ~]$ ssh -i ~/.ssh/id_ncsa_passwordless_rsa 141.142.211.152
             
            [gapon@lsst-gapon-0 ~]$ df -h
            Filesystem      Size  Used Avail Use% Mounted on
            ...
            /dev/vdd        7.8T  7.4T   34G 100% /data2
            /dev/vdc        7.8T  7.3T  106G  99% /data1
            /dev/vde        7.8T  4.2T  3.2T  57% /data3
            /dev/vdb         12T   12T   37G 100% /data
            

            Adjustments made to the MySQL configuration

            Extended the InnoDB file size to allow more space for intermediate tables and indexes:

            {coe:bash}

            % cat /etc/my.cnf.d/server.cnf
            ..
            innodb_data_file_path = ibdata1:10G;ibdata2:10G;

            Then restarted the server:
            {code:bash}
            % sudo systemctl restart mariadb
            

            Show
            gapon Igor Gaponenko added a comment - - edited Verifying a status of the infrastructure services for re-assembling the datataset The project (of patching the "hole") will be based on a setup which was established earlier in a context of DM-7053 . This includes: one OpenStack NEBULA instance lsst-gapon-0 (floating IP: 141.142.211.152) and 4 distributed images: lsst-dm-gapon , lsst-dm-gapon-1 , lsst-dm-gapon-2 and lsst-dm-gapon-3 The virtual machine is up, and all four external volumes are mounted: [gapon@lsst-dev01 ~]$ ssh -i ~/. ssh /id_ncsa_passwordless_rsa 141.142.211.152   [gapon@lsst-gapon-0 ~]$ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/vdd 7.8T 7.4T 34G 100% /data2 /dev/vdc 7.8T 7.3T 106G 99% /data1 /dev/vde 7.8T 4.2T 3.2T 57% /data3 /dev/vdb 12T 12T 37G 100% /data Adjustments made to the MySQL configuration Extended the InnoDB file size to allow more space for intermediate tables and indexes: {coe:bash} % cat /etc/my.cnf.d/server.cnf .. innodb_data_file_path = ibdata1:10G;ibdata2:10G; Then restarted the server: {code:bash} % sudo systemctl restart mariadb
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Defining a suitable boundary for the merge

            According to S13 Testing Plan the S13 DRP dataset was split into two regions with an overalap used for cross-site verification:

            • NCSA: -40< R.A. < +10
            • IN2P3: +5 < R.A. < +55

            The overlap area cover RA range of +5 to +10, and includes 276 unique patches in tract 0, which is demonstrated by:

            SELECT DISTINCT tract,patch
              FROM
                gapon_SDRP_Stripe82.DeepCoadd
              WHERE
                (tract,patch) IN (SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd)
              ORDER BY patch;
            

            +-------+--------+
            | tract | patch  |
            +-------+--------+
            |     0 | 364,0  |
            |     0 | 364,1  |
            |     0 | 364,10 |
            |     0 | 364,11 |
            |     0 | 364,2  |
            |     0 | 364,3  |
            |     0 | 364,4  |
            |     0 | 364,5  |
            |     0 | 364,6  |
            |     0 | 364,7  |
            |     0 | 364,8  |
            |     0 | 364,9  |
            |     0 | 365,0  |
            |     0 | 365,1  |
            |     0 | 365,10 |
            |     0 | 365,11 |
            |     0 | 365,2  |
            |     0 | 365,3  |
            |     0 | 365,4  |
            |     0 | 365,5  |
            |     0 | 365,6  |
            |     0 | 365,7  |
            |     0 | 365,8  |
            |     0 | 365,9  |
            |     0 | 366,0  |
            |     0 | 366,1  |
            |     0 | 366,10 |
            |     0 | 366,11 |
            |     0 | 366,2  |
            |     0 | 366,3  |
            |     0 | 366,4  |
            |     0 | 366,5  |
            |     0 | 366,6  |
            |     0 | 366,7  |
            |     0 | 366,8  |
            |     0 | 366,9  |
            |     0 | 367,0  |
            |     0 | 367,1  |
            |     0 | 367,10 |
            |     0 | 367,11 |
            |     0 | 367,2  |
            |     0 | 367,3  |
            |     0 | 367,4  |
            |     0 | 367,5  |
            |     0 | 367,6  |
            |     0 | 367,7  |
            |     0 | 367,8  |
            |     0 | 367,9  |
            |     0 | 368,0  |
            |     0 | 368,1  |
            |     0 | 368,10 |
            |     0 | 368,11 |
            |     0 | 368,2  |
            |     0 | 368,3  |
            |     0 | 368,4  |
            |     0 | 368,5  |
            |     0 | 368,6  |
            |     0 | 368,7  |
            |     0 | 368,8  |
            |     0 | 368,9  |
            |     0 | 369,0  |
            |     0 | 369,1  |
            |     0 | 369,10 |
            |     0 | 369,11 |
            |     0 | 369,2  |
            |     0 | 369,3  |
            |     0 | 369,4  |
            |     0 | 369,5  |
            |     0 | 369,6  |
            |     0 | 369,7  |
            |     0 | 369,8  |
            |     0 | 369,9  |
            |     0 | 370,0  |
            |     0 | 370,1  |
            |     0 | 370,10 |
            |     0 | 370,11 |
            |     0 | 370,2  |
            |     0 | 370,3  |
            |     0 | 370,4  |
            |     0 | 370,5  |
            |     0 | 370,6  |
            |     0 | 370,7  |
            |     0 | 370,8  |
            |     0 | 370,9  |
            |     0 | 371,0  |
            |     0 | 371,1  |
            |     0 | 371,10 |
            |     0 | 371,11 |
            |     0 | 371,2  |
            |     0 | 371,3  |
            |     0 | 371,4  |
            |     0 | 371,5  |
            |     0 | 371,6  |
            |     0 | 371,7  |
            |     0 | 371,8  |
            |     0 | 371,9  |
            |     0 | 372,0  |
            |     0 | 372,1  |
            |     0 | 372,10 |
            |     0 | 372,11 |
            |     0 | 372,2  |
            |     0 | 372,3  |
            |     0 | 372,4  |
            |     0 | 372,5  |
            |     0 | 372,6  |
            |     0 | 372,7  |
            |     0 | 372,8  |
            |     0 | 372,9  |
            |     0 | 373,0  |
            |     0 | 373,1  |
            |     0 | 373,10 |
            |     0 | 373,11 |
            |     0 | 373,2  |
            |     0 | 373,3  |
            |     0 | 373,4  |
            |     0 | 373,5  |
            |     0 | 373,6  |
            |     0 | 373,7  |
            |     0 | 373,8  |
            |     0 | 373,9  |
            |     0 | 374,0  |
            |     0 | 374,1  |
            |     0 | 374,10 |
            |     0 | 374,11 |
            |     0 | 374,2  |
            |     0 | 374,3  |
            |     0 | 374,4  |
            |     0 | 374,5  |
            |     0 | 374,6  |
            |     0 | 374,7  |
            |     0 | 374,8  |
            |     0 | 374,9  |
            |     0 | 375,0  |
            |     0 | 375,1  |
            |     0 | 375,10 |
            |     0 | 375,11 |
            |     0 | 375,2  |
            |     0 | 375,3  |
            |     0 | 375,4  |
            |     0 | 375,5  |
            |     0 | 375,6  |
            |     0 | 375,7  |
            |     0 | 375,8  |
            |     0 | 375,9  |
            |     0 | 376,0  |
            |     0 | 376,1  |
            |     0 | 376,10 |
            |     0 | 376,11 |
            |     0 | 376,2  |
            |     0 | 376,3  |
            |     0 | 376,4  |
            |     0 | 376,5  |
            |     0 | 376,6  |
            |     0 | 376,7  |
            |     0 | 376,8  |
            |     0 | 376,9  |
            |     0 | 377,0  |
            |     0 | 377,1  |
            |     0 | 377,10 |
            |     0 | 377,11 |
            |     0 | 377,2  |
            |     0 | 377,3  |
            |     0 | 377,4  |
            |     0 | 377,5  |
            |     0 | 377,6  |
            |     0 | 377,7  |
            |     0 | 377,8  |
            |     0 | 377,9  |
            |     0 | 378,0  |
            |     0 | 378,1  |
            |     0 | 378,10 |
            |     0 | 378,11 |
            |     0 | 378,2  |
            |     0 | 378,3  |
            |     0 | 378,4  |
            |     0 | 378,5  |
            |     0 | 378,6  |
            |     0 | 378,7  |
            |     0 | 378,8  |
            |     0 | 378,9  |
            |     0 | 379,0  |
            |     0 | 379,1  |
            |     0 | 379,10 |
            |     0 | 379,11 |
            |     0 | 379,2  |
            |     0 | 379,3  |
            |     0 | 379,4  |
            |     0 | 379,5  |
            |     0 | 379,6  |
            |     0 | 379,7  |
            |     0 | 379,8  |
            |     0 | 379,9  |
            |     0 | 380,0  |
            |     0 | 380,1  |
            |     0 | 380,10 |
            |     0 | 380,11 |
            |     0 | 380,2  |
            |     0 | 380,3  |
            |     0 | 380,4  |
            |     0 | 380,5  |
            |     0 | 380,6  |
            |     0 | 380,7  |
            |     0 | 380,8  |
            |     0 | 380,9  |
            |     0 | 381,0  |
            |     0 | 381,1  |
            |     0 | 381,10 |
            |     0 | 381,11 |
            |     0 | 381,2  |
            |     0 | 381,3  |
            |     0 | 381,4  |
            |     0 | 381,5  |
            |     0 | 381,6  |
            |     0 | 381,7  |
            |     0 | 381,8  |
            |     0 | 381,9  |
            |     0 | 382,0  |
            |     0 | 382,1  |
            |     0 | 382,10 |
            |     0 | 382,11 |
            |     0 | 382,2  |
            |     0 | 382,3  |
            |     0 | 382,4  |
            |     0 | 382,5  |
            |     0 | 382,6  |
            |     0 | 382,7  |
            |     0 | 382,8  |
            |     0 | 382,9  |
            |     0 | 383,0  |
            |     0 | 383,1  |
            |     0 | 383,10 |
            |     0 | 383,11 |
            |     0 | 383,2  |
            |     0 | 383,3  |
            |     0 | 383,4  |
            |     0 | 383,5  |
            |     0 | 383,6  |
            |     0 | 383,7  |
            |     0 | 383,8  |
            |     0 | 383,9  |
            |     0 | 384,0  |
            |     0 | 384,1  |
            |     0 | 384,10 |
            |     0 | 384,11 |
            |     0 | 384,2  |
            |     0 | 384,3  |
            |     0 | 384,4  |
            |     0 | 384,5  |
            |     0 | 384,6  |
            |     0 | 384,7  |
            |     0 | 384,8  |
            |     0 | 384,9  |
            |     0 | 385,0  |
            |     0 | 385,1  |
            |     0 | 385,10 |
            |     0 | 385,11 |
            |     0 | 385,2  |
            |     0 | 385,3  |
            |     0 | 385,4  |
            |     0 | 385,5  |
            |     0 | 385,6  |
            |     0 | 385,7  |
            |     0 | 385,8  |
            |     0 | 385,9  |
            |     0 | 386,0  |
            |     0 | 386,1  |
            |     0 | 386,10 |
            |     0 | 386,11 |
            |     0 | 386,2  |
            |     0 | 386,3  |
            |     0 | 386,4  |
            |     0 | 386,5  |
            |     0 | 386,6  |
            |     0 | 386,7  |
            |     0 | 386,8  |
            |     0 | 386,9  |
            +-------+--------+
            276 rows in set (0.09 sec)
            

            This allows placing the boundary in between patches 375 and 376:

            • NCSA: 376 - 386 (132 unique patches)
            • IN2P3: 364 - 375 (144 unique patches)

            This corresponds to RA angle of 7.3913 degrees.

            Show
            gapon Igor Gaponenko added a comment - - edited Defining a suitable boundary for the merge According to S13 Testing Plan the S13 DRP dataset was split into two regions with an overalap used for cross-site verification: NCSA : -40< R.A. < +10 IN2P3 : +5 < R.A. < +55 The overlap area cover RA range of +5 to +10 , and includes 276 unique patches in tract 0 , which is demonstrated by: SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) IN ( SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) ORDER BY patch; +-------+--------+ | tract | patch | +-------+--------+ | 0 | 364,0 | | 0 | 364,1 | | 0 | 364,10 | | 0 | 364,11 | | 0 | 364,2 | | 0 | 364,3 | | 0 | 364,4 | | 0 | 364,5 | | 0 | 364,6 | | 0 | 364,7 | | 0 | 364,8 | | 0 | 364,9 | | 0 | 365,0 | | 0 | 365,1 | | 0 | 365,10 | | 0 | 365,11 | | 0 | 365,2 | | 0 | 365,3 | | 0 | 365,4 | | 0 | 365,5 | | 0 | 365,6 | | 0 | 365,7 | | 0 | 365,8 | | 0 | 365,9 | | 0 | 366,0 | | 0 | 366,1 | | 0 | 366,10 | | 0 | 366,11 | | 0 | 366,2 | | 0 | 366,3 | | 0 | 366,4 | | 0 | 366,5 | | 0 | 366,6 | | 0 | 366,7 | | 0 | 366,8 | | 0 | 366,9 | | 0 | 367,0 | | 0 | 367,1 | | 0 | 367,10 | | 0 | 367,11 | | 0 | 367,2 | | 0 | 367,3 | | 0 | 367,4 | | 0 | 367,5 | | 0 | 367,6 | | 0 | 367,7 | | 0 | 367,8 | | 0 | 367,9 | | 0 | 368,0 | | 0 | 368,1 | | 0 | 368,10 | | 0 | 368,11 | | 0 | 368,2 | | 0 | 368,3 | | 0 | 368,4 | | 0 | 368,5 | | 0 | 368,6 | | 0 | 368,7 | | 0 | 368,8 | | 0 | 368,9 | | 0 | 369,0 | | 0 | 369,1 | | 0 | 369,10 | | 0 | 369,11 | | 0 | 369,2 | | 0 | 369,3 | | 0 | 369,4 | | 0 | 369,5 | | 0 | 369,6 | | 0 | 369,7 | | 0 | 369,8 | | 0 | 369,9 | | 0 | 370,0 | | 0 | 370,1 | | 0 | 370,10 | | 0 | 370,11 | | 0 | 370,2 | | 0 | 370,3 | | 0 | 370,4 | | 0 | 370,5 | | 0 | 370,6 | | 0 | 370,7 | | 0 | 370,8 | | 0 | 370,9 | | 0 | 371,0 | | 0 | 371,1 | | 0 | 371,10 | | 0 | 371,11 | | 0 | 371,2 | | 0 | 371,3 | | 0 | 371,4 | | 0 | 371,5 | | 0 | 371,6 | | 0 | 371,7 | | 0 | 371,8 | | 0 | 371,9 | | 0 | 372,0 | | 0 | 372,1 | | 0 | 372,10 | | 0 | 372,11 | | 0 | 372,2 | | 0 | 372,3 | | 0 | 372,4 | | 0 | 372,5 | | 0 | 372,6 | | 0 | 372,7 | | 0 | 372,8 | | 0 | 372,9 | | 0 | 373,0 | | 0 | 373,1 | | 0 | 373,10 | | 0 | 373,11 | | 0 | 373,2 | | 0 | 373,3 | | 0 | 373,4 | | 0 | 373,5 | | 0 | 373,6 | | 0 | 373,7 | | 0 | 373,8 | | 0 | 373,9 | | 0 | 374,0 | | 0 | 374,1 | | 0 | 374,10 | | 0 | 374,11 | | 0 | 374,2 | | 0 | 374,3 | | 0 | 374,4 | | 0 | 374,5 | | 0 | 374,6 | | 0 | 374,7 | | 0 | 374,8 | | 0 | 374,9 | | 0 | 375,0 | | 0 | 375,1 | | 0 | 375,10 | | 0 | 375,11 | | 0 | 375,2 | | 0 | 375,3 | | 0 | 375,4 | | 0 | 375,5 | | 0 | 375,6 | | 0 | 375,7 | | 0 | 375,8 | | 0 | 375,9 | | 0 | 376,0 | | 0 | 376,1 | | 0 | 376,10 | | 0 | 376,11 | | 0 | 376,2 | | 0 | 376,3 | | 0 | 376,4 | | 0 | 376,5 | | 0 | 376,6 | | 0 | 376,7 | | 0 | 376,8 | | 0 | 376,9 | | 0 | 377,0 | | 0 | 377,1 | | 0 | 377,10 | | 0 | 377,11 | | 0 | 377,2 | | 0 | 377,3 | | 0 | 377,4 | | 0 | 377,5 | | 0 | 377,6 | | 0 | 377,7 | | 0 | 377,8 | | 0 | 377,9 | | 0 | 378,0 | | 0 | 378,1 | | 0 | 378,10 | | 0 | 378,11 | | 0 | 378,2 | | 0 | 378,3 | | 0 | 378,4 | | 0 | 378,5 | | 0 | 378,6 | | 0 | 378,7 | | 0 | 378,8 | | 0 | 378,9 | | 0 | 379,0 | | 0 | 379,1 | | 0 | 379,10 | | 0 | 379,11 | | 0 | 379,2 | | 0 | 379,3 | | 0 | 379,4 | | 0 | 379,5 | | 0 | 379,6 | | 0 | 379,7 | | 0 | 379,8 | | 0 | 379,9 | | 0 | 380,0 | | 0 | 380,1 | | 0 | 380,10 | | 0 | 380,11 | | 0 | 380,2 | | 0 | 380,3 | | 0 | 380,4 | | 0 | 380,5 | | 0 | 380,6 | | 0 | 380,7 | | 0 | 380,8 | | 0 | 380,9 | | 0 | 381,0 | | 0 | 381,1 | | 0 | 381,10 | | 0 | 381,11 | | 0 | 381,2 | | 0 | 381,3 | | 0 | 381,4 | | 0 | 381,5 | | 0 | 381,6 | | 0 | 381,7 | | 0 | 381,8 | | 0 | 381,9 | | 0 | 382,0 | | 0 | 382,1 | | 0 | 382,10 | | 0 | 382,11 | | 0 | 382,2 | | 0 | 382,3 | | 0 | 382,4 | | 0 | 382,5 | | 0 | 382,6 | | 0 | 382,7 | | 0 | 382,8 | | 0 | 382,9 | | 0 | 383,0 | | 0 | 383,1 | | 0 | 383,10 | | 0 | 383,11 | | 0 | 383,2 | | 0 | 383,3 | | 0 | 383,4 | | 0 | 383,5 | | 0 | 383,6 | | 0 | 383,7 | | 0 | 383,8 | | 0 | 383,9 | | 0 | 384,0 | | 0 | 384,1 | | 0 | 384,10 | | 0 | 384,11 | | 0 | 384,2 | | 0 | 384,3 | | 0 | 384,4 | | 0 | 384,5 | | 0 | 384,6 | | 0 | 384,7 | | 0 | 384,8 | | 0 | 384,9 | | 0 | 385,0 | | 0 | 385,1 | | 0 | 385,10 | | 0 | 385,11 | | 0 | 385,2 | | 0 | 385,3 | | 0 | 385,4 | | 0 | 385,5 | | 0 | 385,6 | | 0 | 385,7 | | 0 | 385,8 | | 0 | 385,9 | | 0 | 386,0 | | 0 | 386,1 | | 0 | 386,10 | | 0 | 386,11 | | 0 | 386,2 | | 0 | 386,3 | | 0 | 386,4 | | 0 | 386,5 | | 0 | 386,6 | | 0 | 386,7 | | 0 | 386,8 | | 0 | 386,9 | +-------+--------+ 276 rows in set (0.09 sec) This allows placing the boundary in between patches 375 and 376 : NCSA : 376 - 386 ( 132 unique patches) IN2P3 : 364 - 375 ( 144 unique patches) This corresponds to RA angle of 7.3913 degrees.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Merging DeepCoadd

            The merge will be done in a separate database: lsst_sdss_stripe82_01
            Database location: /data3 (available space 3.2 TB)
            Space required: < 1 GB

            Preparing the output database

            Creating the database in the default location (/data) and moving it to a different file system (/data3):

            CREATE DATABASE lsst_sdss_stripe82_01;
             
            CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_merge
                    LIKE lsst_sdss_stripe82.DeepCoadd_merge;
             
            CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge
                    LIKE lsst_sdss_stripe82.DeepCoadd_Metadata_merge;
             
            CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge
                    LIKE lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge;
             
            USE lsst_sdss_stripe82_01;
             
            SHOW TABLES;
            +---------------------------------+
            | Tables_in_lsst_sdss_stripe82_01 |
            +---------------------------------+
            | DeepCoadd_Metadata_merge        |
            | DeepCoadd_To_Htm10_merge        |
            | DeepCoadd_merge                 |
            +---------------------------------+
            

            Moving the database folder:

            /bin/sudo -u mysql mv /data/mysql/db/lsst_sdss_stripe82_01 /data3/mysql/db/
            /bin/sudo ln -s /data3/mysql/db/lsst_sdss_stripe82_01 /data/mysql/db/
            

            Testing results by connecting to the MySQL server:

            USE lsst_sdss_stripe82_01;
             
            SHOW TABLES;
            +---------------------------------+
            | Tables_in_lsst_sdss_stripe82_01 |
            +---------------------------------+
            | DeepCoadd_Metadata_merge        |
            | DeepCoadd_To_Htm10_merge        |
            | DeepCoadd_merge                 |
            +---------------------------------+
            3 rows in set (0.00 sec)
            

            Copying data from the NCSA region

            This operation would copy all NCSA-specific rows excluding those ones which correspond to the earlier identified 144 unique patches on the IN2P3 side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables).

            Collecting NCSA-specific deepCoaddId:

            SELECT COUNT(deepCoaddId) FROM gapon_SDRP_Stripe82.DeepCoadd;
            +--------------------+
            | COUNT(deepCoaddId) |
            +--------------------+
            |              13735 |
            +--------------------+
            

            CREATE TABLE lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId
              AS SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd
                WHERE (tract,patch) NOT IN
                  (SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd
                    WHERE (tract,patch) IN
                      (SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) 
                    AND patch < "376,0"
                  );
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId;
            +----------+
            | COUNT(*) |
            +----------+
            |    13015 |
            +----------+
            

            CREATE INDEX IDX_deepCoaddId
              ON lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId (deepCoaddId);
            

            Note the DIFFERENCE in the number of rows: 720. This corresponds to 144 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to IN2P3.

            Copy relevant rows from the input table DeepCoadd:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge
              SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId);
            Query OK, 13015 rows affected (0.10 sec)
            Records: 13015  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |    13015 |
            +----------+
            

            Copy relevant rows from the input table DeepCoadd_Metadata:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge
              SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_Metadata
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId);
            Query OK, 2658664 rows affected (3 min 7.83 sec)
            Records: 2658664  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |  2658664 |
            +----------+
            

            Copy relevant rows from the input table DeepCoadd_To_Htm10:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge
              SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_To_Htm10
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId);
            Query OK, 309090 rows affected (0.63 sec)
            Records: 309090  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |   309090 |
            +----------+
            

            Copying data from the IN2P3 region

            This operation would copy all IN2P3-specific rows excluding those ones which correspond to the earlier identified 132 unique patches on the NCSA side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables).

            Collecting IN2P3-specific deepCoaddId:

            SELECT COUNT(deepCoaddId) FROM lsst_prod_DC_2013_2.DeepCoadd;
            +--------------------+
            | COUNT(deepCoaddId) |
            +--------------------+
            |              13680 |
            +--------------------+
            1 row in set (0.00 sec)
            

            CREATE TABLE lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId
              AS SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd
                WHERE (tract,patch) NOT IN
                  (SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd
                    WHERE (tract,patch) IN
                      (SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) 
                    AND patch >= "376,0"
                  );
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId;
             +----------+
            | COUNT(*) |
            +----------+
            |    13020 |
            +----------+
            1 row in set (0.00 sec)
            

            CREATE INDEX IDX_deepCoaddId
              ON lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId (deepCoaddId);
            

            Note the DIFFERENCE in the number of rows: 660. This corresponds to 132 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to NCSA.

            Copy relevant rows from the input table DeepCoadd:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge
              SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId);
            Query OK, 13020 rows affected (0.26 sec)
            Records: 13020  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |    26035 |
            +----------+
            

            Copy relevant rows from the input table DeepCoadd_Metadata:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge
              SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_Metadata
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId);
            Query OK, 2884680 rows affected (1 min 44.95 sec)
            Records: 2884680  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |  5543344 |
            +----------+
            

            Copy relevant rows from the input table DeepCoadd_To_Htm10:

            INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge
              SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_To_Htm10
                WHERE deepCoaddId IN
                    (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId);
            Query OK, 308610 rows affected (1.55 sec)
            Records: 308610  Duplicates: 0  Warnings: 0
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |   617700 |
            +----------+
            

            Comparing totals with the previous merge

            Resulting tables from the previous merge are found in database: lsst_sdss_stripe82

            Table DeepCoadd:

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |    26035 |
            +----------+
             
            SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |    26035 |
            +----------+
            

            Table DeepCoadd_Metadata:

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |  5543344 |
            +----------+
             
            SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_Metadata_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |  5543347 |
            +----------+
            

            ATTENTION: note a difference in 3 extra rows present in the previously merged table. This may need to be further investigated.

            Table DeepCoadd_To_Htm10:

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |   617700 |
            +----------+
             
            SELECT COUNT(*) FROM lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge;
            +----------+
            | COUNT(*) |
            +----------+
            |   617700 |
            +----------+
            

            Show
            gapon Igor Gaponenko added a comment - - edited Merging DeepCoadd The merge will be done in a separate database: lsst_sdss_stripe82_01 Database location: /data3 (available space 3.2 TB ) Space required: < 1 GB Preparing the output database Creating the database in the default location ( /data ) and moving it to a different file system ( /data3 ): CREATE DATABASE lsst_sdss_stripe82_01;   CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_merge LIKE lsst_sdss_stripe82.DeepCoadd_merge;   CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge LIKE lsst_sdss_stripe82.DeepCoadd_Metadata_merge;   CREATE TABLE lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge LIKE lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge;   USE lsst_sdss_stripe82_01;   SHOW TABLES; + ---------------------------------+ | Tables_in_lsst_sdss_stripe82_01 | + ---------------------------------+ | DeepCoadd_Metadata_merge | | DeepCoadd_To_Htm10_merge | | DeepCoadd_merge | + ---------------------------------+ Moving the database folder: /bin/sudo -u mysql mv /data/mysql/db/lsst_sdss_stripe82_01 /data3/mysql/db/ /bin/sudo ln -s /data3/mysql/db/lsst_sdss_stripe82_01 /data/mysql/db/ Testing results by connecting to the MySQL server: USE lsst_sdss_stripe82_01;   SHOW TABLES; + ---------------------------------+ | Tables_in_lsst_sdss_stripe82_01 | + ---------------------------------+ | DeepCoadd_Metadata_merge | | DeepCoadd_To_Htm10_merge | | DeepCoadd_merge | + ---------------------------------+ 3 rows in set (0.00 sec) Copying data from the NCSA region This operation would copy all NCSA -specific rows excluding those ones which correspond to the earlier identified 144 unique patches on the IN2P3 side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables). Collecting NCSA-specific deepCoaddId : SELECT COUNT (deepCoaddId) FROM gapon_SDRP_Stripe82.DeepCoadd; + --------------------+ | COUNT (deepCoaddId) | + --------------------+ | 13735 | + --------------------+ CREATE TABLE lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId AS SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) NOT IN ( SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) IN ( SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) AND patch < "376,0" ); SELECT COUNT (*) FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId; + ----------+ | COUNT (*) | + ----------+ | 13015 | + ----------+ CREATE INDEX IDX_deepCoaddId ON lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId (deepCoaddId); Note the DIFFERENCE in the number of rows: 720 . This corresponds to 144 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to IN2P3 . Copy relevant rows from the input table DeepCoadd : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); Query OK, 13015 rows affected (0.10 sec) Records: 13015 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; + ----------+ | COUNT (*) | + ----------+ | 13015 | + ----------+ Copy relevant rows from the input table DeepCoadd_Metadata : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_Metadata WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); Query OK, 2658664 rows affected (3 min 7.83 sec) Records: 2658664 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; + ----------+ | COUNT (*) | + ----------+ | 2658664 | + ----------+ Copy relevant rows from the input table DeepCoadd_To_Htm10 : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge SELECT * FROM gapon_SDRP_Stripe82.DeepCoadd_To_Htm10 WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); Query OK, 309090 rows affected (0.63 sec) Records: 309090 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; + ----------+ | COUNT (*) | + ----------+ | 309090 | + ----------+ Copying data from the IN2P3 region This operation would copy all IN2P3 -specific rows excluding those ones which correspond to the earlier identified 132 unique patches on the NCSA side of the merge boundary. For the sake of efficiency a complete list of the deepCoaddId is collected in a separate table and used through the rest of the procedure (including subsequent stages of merging entries from the RunDeepSource tables). Collecting IN2P3-specific deepCoaddId : SELECT COUNT (deepCoaddId) FROM lsst_prod_DC_2013_2.DeepCoadd; + --------------------+ | COUNT (deepCoaddId) | + --------------------+ | 13680 | + --------------------+ 1 row in set (0.00 sec) CREATE TABLE lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId AS SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd WHERE (tract,patch) NOT IN ( SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) IN ( SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) AND patch >= "376,0" ); SELECT COUNT (*) FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId; + ----------+ | COUNT (*) | + ----------+ | 13020 | + ----------+ 1 row in set (0.00 sec) CREATE INDEX IDX_deepCoaddId ON lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId (deepCoaddId); Note the DIFFERENCE in the number of rows: 660 . This corresponds to 132 unique patches multiplied by 5 filters (each deepCoaddId is based on a triplet of (tract,patch,filterId)) in the overlap area belonging to NCSA . Copy relevant rows from the input table DeepCoadd : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_merge SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); Query OK, 13020 rows affected (0.26 sec) Records: 13020 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; + ----------+ | COUNT (*) | + ----------+ | 26035 | + ----------+ Copy relevant rows from the input table DeepCoadd_Metadata : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_Metadata WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); Query OK, 2884680 rows affected (1 min 44.95 sec) Records: 2884680 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; + ----------+ | COUNT (*) | + ----------+ | 5543344 | + ----------+ Copy relevant rows from the input table DeepCoadd_To_Htm10 : INSERT INTO lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge SELECT * FROM lsst_prod_DC_2013_2.DeepCoadd_To_Htm10 WHERE deepCoaddId IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); Query OK, 308610 rows affected (1.55 sec) Records: 308610 Duplicates: 0 Warnings: 0 SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; + ----------+ | COUNT (*) | + ----------+ | 617700 | + ----------+ Comparing totals with the previous merge Resulting tables from the previous merge are found in database: lsst_sdss_stripe82 Table DeepCoadd : SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_merge; + ----------+ | COUNT (*) | + ----------+ | 26035 | + ----------+   SELECT COUNT (*) FROM lsst_sdss_stripe82.DeepCoadd_merge; + ----------+ | COUNT (*) | + ----------+ | 26035 | + ----------+ Table DeepCoadd_Metadata : SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_Metadata_merge; + ----------+ | COUNT (*) | + ----------+ | 5543344 | + ----------+   SELECT COUNT (*) FROM lsst_sdss_stripe82.DeepCoadd_Metadata_merge; + ----------+ | COUNT (*) | + ----------+ | 5543347 | + ----------+ ATTENTION : note a difference in 3 extra rows present in the previously merged table. This may need to be further investigated. Table DeepCoadd_To_Htm10 : SELECT COUNT (*) FROM lsst_sdss_stripe82_01.DeepCoadd_To_Htm10_merge; + ----------+ | COUNT (*) | + ----------+ | 617700 | + ----------+   SELECT COUNT (*) FROM lsst_sdss_stripe82.DeepCoadd_To_Htm10_merge; + ----------+ | COUNT (*) | + ----------+ | 617700 | + ----------+
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Merging RunDeepSource (similar to LSST Object)

            The merge will be done within the same database where table DeepCoadd was merged earlier: lsst_sdss_stripe82_01
            Database location: /data3 (available space 3.2 TB)
            Space required: < ~100+ GB (see next subsection for more details)

            Estimating space requirements

            NCSA:

            SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]'
              FROM information_schema.tables
              WHERE TABLE_SCHEMA="gapon_SDRP_Stripe82"
              AND TABLE_NAME="RunDeepSource";
            +------------+------------------+
            | TABLE_ROWS | DATA_LENGTH [GB] |
            +------------+------------------+
            |   98577782 |             51.2 |
            +------------+------------------+
            

            IN2P3:

            SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]'
              FROM information_schema.tables
              WHERE TABLE_SCHEMA="lsst_prod_DC_2013_2"
              AND TABLE_NAME="RunDeepSource";
            +------------+------------------+
            | TABLE_ROWS | DATA_LENGTH [GB] |
            +------------+------------------+
            |   98088148 |             51.0 |
            +------------+------------------+
            

            Preparing the output table

            ATENTION: all (but the PRIMARY) keys are disable on the output table to avoid slowing the insertion down because of the index rebuilds.

            CREATE TABLE lsst_sdss_stripe82_01.RunDeepSource_merge
                    LIKE gapon_SDRP_Stripe82.RunDeepSource;
             
            ALTER TABLE lsst_sdss_stripe82_01.RunDeepSource_merge DISABLE KEYS;
            SHOW INDEX FROM lsst_sdss_stripe82_01.RunDeepSource_merge;
            +---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+
            | Table               | Non_unique | Key_name          | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment  | Index_comment |
            +---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+
            | RunDeepSource_merge |          0 | PRIMARY           |            1 | id            | A         |           0 |     NULL | NULL   |      | BTREE      |          |               |
            | RunDeepSource_merge |          1 | IDX_coord_htmId20 |            1 | coord_htmId20 | A         |        NULL |     NULL | NULL   | YES  | BTREE      | disabled |               |
            | RunDeepSource_merge |          1 | IDX_coord_decl    |            1 | coord_decl    | A         |        NULL |     NULL | NULL   | YES  | BTREE      | disabled |               |
            | RunDeepSource_merge |          1 | IDX_parent        |            1 | parent        | A         |        NULL |     NULL | NULL   | YES  | BTREE      | disabled |               |
            | RunDeepSource_merge |          1 | IDX_coadd_id      |            1 | coadd_id      | A         |        NULL |     NULL | NULL   |      | BTREE      | disabled |               |
            +---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+
            

            Copying rows from the NCSA region

            ATTENTION: this operation is performed in the batch mode as it may be quite lengthy.

            This is the query to be executed:

            SELECT NOW();
            INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge
              SELECT * FROM gapon_SDRP_Stripe82.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId);
            SELECT NOW();
            

            The query is placed into a file at:

            /home/gapon/ncsa_RunDeepSource_merge.sql
            

            Launching the query:

            % cd /home/gapon/
            % nohup cat ncsa_RunDeepSource_merge.sql | mysql -u root >& ncsa_RunDeepSource_merge.log&
            

            Checking the log file:

            2017-05-10 01:50:30
            2017-05-10 02:20:04
            

            and results within the database:

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge;
            +----------+
            | COUNT(*) |
            +----------+
            | 93285875 |
            +----------+
            

            OBSERVATION: this number is smaller than the input number of rows. This is perfectly expected because we've excluded objects from the opposite part of the catalog in the overlap area.

            Copying rows from the IN2P3 region

            This is the query to be executed:

            SELECT NOW();
            INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge
              SELECT * FROM lsst_prod_DC_2013_2.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId);
            SELECT NOW();
            

            The query is placed into a file at:

            /home/gapon/in2p3_RunDeepSource_merge.sql
            

            Launching the query:

            % cd /home/gapon/
            % nohup cat in2p3_RunDeepSource_merge.sql | mysql -u root >&  in2p3_RunDeepSource_merge.log&
            

            Checking the log file:

            2017-05-10 02:27:09
            2017-05-10 02:52:47
            

            and results within the database:

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge;
            +-----------+
            | COUNT(*)  |
            +-----------+
            | 186671762 |
            +-----------+
             
            SELECT 186671762 - 93285875;
            +----------------------+
            | 186671762 - 93285875 |
            +----------------------+
            |             93385887 |
            +----------------------+
            

            OBSERVATION: this number is smaller than the total sum of input number of rows. And the second number is smaller the the number of rows in the input table. This is perfectly expected because we've excluded objects from opposite parts of the catalog in the overlap area.

            Extract NCSA-specific object identifiers

            We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from NCSA:

            CREATE TABLE lsst_sdss_stripe82_01.ncsa_RunDeepSource_id
              SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId);
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id;
            +----------+
            | COUNT(*) |
            +----------+
            | 93285875 |
            +----------+
            

            OBSERVATION: This number matches the number of entries copied earlier from the same region.

            Also create a PK index on the table which will allow optimized query processing when merging forced sources:

            CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.ncsa_RunDeepSource_id (id);
            Query OK, 0 rows affected (5 min 38.34 sec)
            Records: 0  Duplicates: 0  Warnings: 0
             
            SHOW INDEX FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id;
            +-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            | Table                 | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
            +-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            | ncsa_RunDeepSource_id |          0 | IDX_id   |            1 | id          | A         |    93286489 |     NULL | NULL   |      | BTREE      |         |               |
            +-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            

            Also create a list of NCSA-specific objects in the overlap area:

            CREATE TABLE lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id
              SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd
                  WHERE (tract,patch) IN
                    (SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd
                      WHERE (tract,patch) IN
                        (SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) 
                      AND patch >= "376,0"
                    ));
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id;
            +----------+
            | COUNT(*) |
            +----------+
            |  4699622 |
            +----------+
            1 row in set (1.95 sec)
            

            CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id (id);
            Query OK, 0 rows affected (17.34 sec)
            Records: 0  Duplicates: 0  Warnings: 0
            

            Extract IN2P3-specific object identifiers

            We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from IN2P3:

            CREATE TABLE lsst_sdss_stripe82_01.in2p3_RunDeepSource_id
              SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId);
            

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id;
            +----------+
            | COUNT(*) |
            +----------+
            | 93385887 |
            +----------+
            

            OBSERVATION: This number matches the number of entries copied earlier from the same region.

            CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.in2p3_RunDeepSource_id (id); 
            Query OK, 0 rows affected (6 min 12.75 sec)
            Records: 0  Duplicates: 0  Warnings: 0
             
            SHOW INDEX FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id;
            +------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            | Table                  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
            +------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            | in2p3_RunDeepSource_id |          0 | IDX_id   |            1 | id          | A         |    93386365 |     NULL | NULL   |      | BTREE      |         |               |
            +------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
            

            Also create a list of IN2P3-specific objects in the overlap area:

            CREATE TABLE lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id
              SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource
              WHERE coadd_id IN
                (SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd
                  WHERE (tract,patch) IN
                    (SELECT DISTINCT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd
                      WHERE (tract,patch) IN
                        (SELECT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd) 
                      AND patch < "376,0"
                    ));
            

            PROBLEM: the previous query got stuck for many hours.

            SELECT COUNT(*) FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id;
            +----------+
            | COUNT(*) |
            +----------+
            |  5291112 |
            +----------+
            1 row in set (1.43 sec)
            

            CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id (id);
            Query OK, 0 rows affected (19.27 sec)
            Records: 0  Duplicates: 0  Warnings: 0
            

            Show
            gapon Igor Gaponenko added a comment - - edited Merging RunDeepSource (similar to LSST Object ) The merge will be done within the same database where table DeepCoadd was merged earlier: lsst_sdss_stripe82_01 Database location: /data3 (available space 3.2 TB ) Space required: < ~ 100+ GB (see next subsection for more details) Estimating space requirements NCSA : SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]' FROM information_schema.tables WHERE TABLE_SCHEMA= "gapon_SDRP_Stripe82" AND TABLE_NAME= "RunDeepSource" ; + ------------+------------------+ | TABLE_ROWS | DATA_LENGTH [GB] | + ------------+------------------+ | 98577782 | 51.2 | + ------------+------------------+ IN2P3 : SELECT TABLE_ROWS, ROUND(DATA_LENGTH/1024/1024/1024,1) AS 'DATA_LENGTH [GB]' FROM information_schema.tables WHERE TABLE_SCHEMA= "lsst_prod_DC_2013_2" AND TABLE_NAME= "RunDeepSource" ; + ------------+------------------+ | TABLE_ROWS | DATA_LENGTH [GB] | + ------------+------------------+ | 98088148 | 51.0 | + ------------+------------------+ Preparing the output table ATENTION : all (but the PRIMARY ) keys are disable on the output table to avoid slowing the insertion down because of the index rebuilds. CREATE TABLE lsst_sdss_stripe82_01.RunDeepSource_merge LIKE gapon_SDRP_Stripe82.RunDeepSource;   ALTER TABLE lsst_sdss_stripe82_01.RunDeepSource_merge DISABLE KEYS; SHOW INDEX FROM lsst_sdss_stripe82_01.RunDeepSource_merge; + ---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | + ---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ | RunDeepSource_merge | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | | RunDeepSource_merge | 1 | IDX_coord_htmId20 | 1 | coord_htmId20 | A | NULL | NULL | NULL | YES | BTREE | disabled | | | RunDeepSource_merge | 1 | IDX_coord_decl | 1 | coord_decl | A | NULL | NULL | NULL | YES | BTREE | disabled | | | RunDeepSource_merge | 1 | IDX_parent | 1 | parent | A | NULL | NULL | NULL | YES | BTREE | disabled | | | RunDeepSource_merge | 1 | IDX_coadd_id | 1 | coadd_id | A | NULL | NULL | NULL | | BTREE | disabled | | + ---------------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+----------+---------------+ Copying rows from the NCSA region ATTENTION : this operation is performed in the batch mode as it may be quite lengthy. This is the query to be executed: SELECT NOW(); INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge SELECT * FROM gapon_SDRP_Stripe82.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); SELECT NOW(); The query is placed into a file at: /home/gapon/ncsa_RunDeepSource_merge.sql Launching the query: % cd /home/gapon/ % nohup cat ncsa_RunDeepSource_merge.sql | mysql -u root >& ncsa_RunDeepSource_merge.log& Checking the log file: 2017-05-10 01:50:30 2017-05-10 02:20:04 and results within the database: SELECT COUNT (*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge; + ----------+ | COUNT (*) | + ----------+ | 93285875 | + ----------+ OBSERVATION : this number is smaller than the input number of rows. This is perfectly expected because we've excluded objects from the opposite part of the catalog in the overlap area. Copying rows from the IN2P3 region This is the query to be executed: SELECT NOW(); INSERT INTO lsst_sdss_stripe82_01.RunDeepSource_merge SELECT * FROM lsst_prod_DC_2013_2.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); SELECT NOW(); The query is placed into a file at: /home/gapon/in2p3_RunDeepSource_merge.sql Launching the query: % cd /home/gapon/ % nohup cat in2p3_RunDeepSource_merge.sql | mysql -u root >& in2p3_RunDeepSource_merge.log& Checking the log file: 2017-05-10 02:27:09 2017-05-10 02:52:47 and results within the database: SELECT COUNT (*) FROM lsst_sdss_stripe82_01.RunDeepSource_merge; + -----------+ | COUNT (*) | + -----------+ | 186671762 | + -----------+   SELECT 186671762 - 93285875; + ----------------------+ | 186671762 - 93285875 | + ----------------------+ | 93385887 | + ----------------------+ OBSERVATION : this number is smaller than the total sum of input number of rows. And the second number is smaller the the number of rows in the input table. This is perfectly expected because we've excluded objects from opposite parts of the catalog in the overlap area. Extract NCSA -specific object identifiers We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from NCSA : CREATE TABLE lsst_sdss_stripe82_01.ncsa_RunDeepSource_id SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.ncsa_DeepCoadd_deepCoaddId); SELECT COUNT (*) FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id; + ----------+ | COUNT (*) | + ----------+ | 93285875 | + ----------+ OBSERVATION : This number matches the number of entries copied earlier from the same region. Also create a PK index on the table which will allow optimized query processing when merging forced sources : CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.ncsa_RunDeepSource_id (id); Query OK, 0 rows affected (5 min 38.34 sec) Records: 0 Duplicates: 0 Warnings: 0   SHOW INDEX FROM lsst_sdss_stripe82_01.ncsa_RunDeepSource_id; + -----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | + -----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | ncsa_RunDeepSource_id | 0 | IDX_id | 1 | id | A | 93286489 | NULL | NULL | | BTREE | | | + -----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ Also create a list of NCSA -specific objects in the overlap area: CREATE TABLE lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id SELECT id FROM gapon_SDRP_Stripe82.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) IN ( SELECT DISTINCT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd WHERE (tract,patch) IN ( SELECT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd) AND patch >= "376,0" )); SELECT COUNT (*) FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id; + ----------+ | COUNT (*) | + ----------+ | 4699622 | + ----------+ 1 row in set (1.95 sec) CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id (id); Query OK, 0 rows affected (17.34 sec) Records: 0 Duplicates: 0 Warnings: 0 Extract IN2P3 -specific object identifiers We need this list in order to optimize a merge of rows from the RunDeepForcedSource tables from IN2P3 : CREATE TABLE lsst_sdss_stripe82_01.in2p3_RunDeepSource_id SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM lsst_sdss_stripe82_01.in2p3_DeepCoadd_deepCoaddId); SELECT COUNT (*) FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id; + ----------+ | COUNT (*) | + ----------+ | 93385887 | + ----------+ OBSERVATION : This number matches the number of entries copied earlier from the same region. CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.in2p3_RunDeepSource_id (id); Query OK, 0 rows affected (6 min 12.75 sec) Records: 0 Duplicates: 0 Warnings: 0   SHOW INDEX FROM lsst_sdss_stripe82_01.in2p3_RunDeepSource_id; + ------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | + ------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | in2p3_RunDeepSource_id | 0 | IDX_id | 1 | id | A | 93386365 | NULL | NULL | | BTREE | | | + ------------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ Also create a list of IN2P3 -specific objects in the overlap area: CREATE TABLE lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id SELECT id FROM lsst_prod_DC_2013_2.RunDeepSource WHERE coadd_id IN ( SELECT deepCoaddId FROM lsst_prod_DC_2013_2.DeepCoadd WHERE (tract,patch) IN ( SELECT DISTINCT tract,patch FROM lsst_prod_DC_2013_2.DeepCoadd WHERE (tract,patch) IN ( SELECT tract,patch FROM gapon_SDRP_Stripe82.DeepCoadd) AND patch < "376,0" )); PROBLEM : the previous query got stuck for many hours. SELECT COUNT (*) FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id; + ----------+ | COUNT (*) | + ----------+ | 5291112 | + ----------+ 1 row in set (1.43 sec) CREATE UNIQUE INDEX IDX_id ON lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id (id); Query OK, 0 rows affected (19.27 sec) Records: 0 Duplicates: 0 Warnings: 0
            Hide
            gapon Igor Gaponenko added a comment - - edited

            De-duplicating RunDeepForcedSource (similar to LSST ForcedSource)

            Evaluation steps:

            • study the procedure for filtering forced sources which was devised last time.
            • prepare a seep-by-step plan for processing these tables.
            • visit the status of the input tables (RunDeepForcedSource) to see which are available and if they’re healthy.
            • eliminate tables left from the previous effort to free up the space
            • evaluate space requirements and space availability (note each instance of the table will have two output tables: _dedup and _radecl

            Clearing unneeded or redundant tables (NCSA)

            This step allows to get extra space for merging the tables.

            USE daues_SDRP_dedupe_byfilter_4;
            SHOW TABLES;
            +----------------------------------------+
            | Tables_in_daues_SDRP_dedupe_byfilter_4 |
            +----------------------------------------+
            | RunDeepForcedSource                    |
            | RunDeepForcedSource_radecl             |
            +----------------------------------------+
             
            SELECT COUNT(*) FROM RunDeepForcedSource_radecl;
            +------------+
            | COUNT(*)   |
            +------------+
            | 1751879670 |
            +------------+
             
            SELECT COUNT(*) FROM RunDeepForcedSource;
            +------------+
            | COUNT(*)   |
            +------------+
            | 1751879670 |
            +------------+
            

            OBSERVATION: The extended table _radecl has two extra columns needed by the partitioning tool. Otherwise this is the same original (un-modified) table because the way the previous merge attempt was done by giving a preference to the forced sources from NCSA witin the overlap area.

            So, we can eliminate those original un-extended tables from all 5 databases:

            DROP TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource;
            DROP TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource;
            DROP TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource;
            DROP TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource;
            DROP TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource;
            

            Clearing unneeded or redundant tables (IN2P3)

            This step affects the followingdatabases:

            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_z
            

            Inspecting tables in one of those to which what should be left:

            USE lsst_prod_dedupe_byfilter_u;
            SHOW TABLES;
            +---------------------------------------+
            | Tables_in_lsst_prod_dedupe_byfilter_u |
            +---------------------------------------+
            | RunDeepForcedSource                   |
            | RunDeepForcedSource_dedup             |
            | RunDeepForcedSource_dedup_radecl      |
            +---------------------------------------+
            

            SELECT COUNT(*) FROM RunDeepForcedSource;
            +------------+
            | COUNT(*)   |
            +------------+
            | 1897928494 |
            +------------+
            SELECT COUNT(*) FROM RunDeepForcedSource_dedup;
            +------------+
            | COUNT(*)   |
            +------------+
            | 1720938745 |
            +------------+
            SELECT COUNT(*) FROM RunDeepForcedSource_dedup_radecl;
            +------------+
            | COUNT(*)   |
            +------------+
            | 1720938745 |
            +------------+
            

            OBSERVATION: The original table is the one which should be used for the merging effort and eliminate those duplicated and extended tables from all 5 databases:

            DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup;
            DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup_radecl;
            DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup;
            DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup_radecl;
            DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup;
            DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup_radecl;
            DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup;
            DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup_radecl;
            DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup;
            DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup_radecl;
            

            Also eliminate 4 temporary tables:

            DROP TABLE lsst_prod_dedupe_byfilter_g.tmp_dedupe_g;
            DROP TABLE lsst_prod_dedupe_byfilter_r.tmp_dedupe_r;
            DROP TABLE lsst_prod_dedupe_byfilter_i.tmp_dedupe_i;
            DROP TABLE lsst_prod_dedupe_byfilter_z.tmp_dedupe_z;
            

            De-duplicating NCSA tables

            GENERAL ALGORITHM: for every (5 in total) input table and the corresponding (same schema as the input one) output one copy all rows which are NOT linked to the previously identified objects from the IN2P3's part of the overlap. The identifiers of those objects are found in table:

            lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id
            

            This operation was launched for all 5 tables of 5 databases located at:

            % ls -al /data/mysql/db/
            drwx------   2 mysql mysql  4096 May 11 18:20 daues_SDRP_dedupe_byfilter_0
            drwx------   2 mysql mysql  4096 May 11 18:26 daues_SDRP_dedupe_byfilter_1
            drwx------   2 mysql mysql  4096 May 11 18:26 daues_SDRP_dedupe_byfilter_2
            drwx------   2 mysql mysql  4096 May 11 18:26 daues_SDRP_dedupe_byfilter_3
            drwx------   2 mysql mysql  4096 May 11 18:26 daues_SDRP_dedupe_byfilter_4
            

            Before the processing began the file system had more than enough of the free space to accommodate the new tables (3.5 TB is required):

            % df -h /data/mysql/db/
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vdb         12T  5.0T  6.1T  45% /data
            

            Creating the output tables:

            CREATE TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
             
            CREATE TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup
              LIKE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl;
             
            CREATE TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup
              LIKE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl;
             
            CREATE TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup
              LIKE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl;
             
            CREATE TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup
              LIKE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl;
            

            Launching the following 5 commands IN PARALLEL in the batch mode:

            INSERT INTO daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup
              SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id);
            

            INSERT INTO daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup
              SELECT * FROM daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id);
            

            INSERT INTO daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup
              SELECT * FROM daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id);
            

            INSERT INTO daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup
              SELECT * FROM daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id);
            

            INSERT INTO daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup
              SELECT * FROM daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id);
            

            The log files for the operations are found at:

            % ls -al /home/gapon/logs/
            -rw-rw-r--   1 gapon gapon    0 May 11 18:28 daues_SDRP_dedupe_byfilter_0.log
            -rw-rw-r--   1 gapon gapon    0 May 11 18:33 daues_SDRP_dedupe_byfilter_1.log
            -rw-rw-r--   1 gapon gapon    0 May 11 18:34 daues_SDRP_dedupe_byfilter_2.log
            -rw-rw-r--   1 gapon gapon    0 May 11 18:34 daues_SDRP_dedupe_byfilter_3.log
            -rw-rw-r--   1 gapon gapon    0 May 11 18:34 daues_SDRP_dedupe_byfilter_4.log
            

            When the operation finished:

            % df -h /data/mysql/db/
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vdb         12T  8.1T  3.1T  73% /data
            

            Running a quick test for each table:

            SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH
              FROM information_schema.tables
              WHERE TABLE_SCHEMA LIKE "daues_SDRP_dedupe_byfilter\__";
            +------------------------------+----------------------------------+------------+--------------+
            | TABLE_SCHEMA                 | TABLE_NAME                       | TABLE_ROWS | DATA_LENGTH  |
            +------------------------------+----------------------------------+------------+--------------+
            | daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl       | 1729599791 | 710865514101 |
            | daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl_dedup | 1634106268 | 671617676148 |
            | daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl       | 1752805399 | 720403018989 |
            | daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl_dedup | 1654729991 | 680094026301 |
            | daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl       | 1754290686 | 721013471946 |
            | daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl_dedup | 1656203271 | 680699544381 |
            | daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl       | 1754272841 | 721006137651 |
            | daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl_dedup | 1656209922 | 680702277942 |
            | daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl       | 1751879670 | 720022544370 |
            | daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl_dedup | 1654034432 | 679808151552 |
            +------------------------------+----------------------------------+------------+--------------+
            

            Extending IN2P3 tables

            IMPORTANT: A goal of tis step is to add two extra columns which have spherical coordinates of the corresponding objects. This will be needed to correctly partition forced sources (for a variety of technical reasons, the current implementation of the partitioning algorithm is NOT using the explicit association between forced-sources and the corresponding objects and is relying upon an implicit spatial association).

            This operation was launched for 5 tables of 5 databases located at:

            % ls -al /data/mysql/db/
            lrwxrwxrwx   1 root  root    43 Sep  9  2016 lsst_prod_dedupe_byfilter_g -> /data2/mysql/db/lsst_prod_dedupe_byfilter_g
            lrwxrwxrwx   1 root  root    43 Sep  9  2016 lsst_prod_dedupe_byfilter_i -> /data2/mysql/db/lsst_prod_dedupe_byfilter_i
            lrwxrwxrwx   1 root  root    43 Sep  9  2016 lsst_prod_dedupe_byfilter_r -> /data2/mysql/db/lsst_prod_dedupe_byfilter_r
            lrwxrwxrwx   1 root  root    43 May 12 06:08 lsst_prod_dedupe_byfilter_u -> /data2/mysql/db/lsst_prod_dedupe_byfilter_u
            lrwxrwxrwx   1 root  root    43 May 12 20:54 lsst_prod_dedupe_byfilter_z -> /data2/mysql/db/lsst_prod_dedupe_byfilter_z
            

            The file system has more than enough of the free space to accommodate the new tables (3.5 TB is required):

            % df -h /data2/mysql/db/
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vdd        7.8T  3.7T  3.8T  49% /data2
            

            Creating the output tables:

            CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl
              LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;
            

            The GENERAL ALGORITHM relies upon a small table produced earlier in a context of the original effort. This table is a subset of the full RunDeepSource table which only has a triplet of (id,ra,decl) for each object.

            SELECT COUNT(*) FROM lsst_prod_DC_2013_2.RunDeepSource_radecl;
            +----------+
            | COUNT(*) |
            +----------+
            | 98088148 |
            +----------+
            

            The following operation is in progress for all 5 filter bands:

            INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl
              SELECT rdfs.*,rdss.coord_ra,rdss.coord_decl
              FROM
                lsst_prod_dedupe_byfilter_u.RunDeepForcedSource `rdfs`
                  STRAIGHT_JOIN
                lsst_prod_DC_2013_2.RunDeepSource_radecl `rdss`
                  ON 
                rdfs.objectId = rdss.id;
            

            The log files for the operations are found at:

            % ls -al /home/gapon/logs/
            -rw-rw-r--   1 gapon gapon    0 May 12 21:44 lsst_prod_dedupe_byfilter_g_radecl.log
            -rw-rw-r--   1 gapon gapon    0 May 12 21:44 lsst_prod_dedupe_byfilter_i_radecl.log
            -rw-rw-r--   1 gapon gapon    0 May 12 21:44 lsst_prod_dedupe_byfilter_r_radecl.log
            -rw-rw-r--   1 gapon gapon    0 May 12 21:43 lsst_prod_dedupe_byfilter_u_radecl.log
            -rw-rw-r--   1 gapon gapon    0 May 12 21:44 lsst_prod_dedupe_byfilter_z_radecl.log
            

            Tested results by:

            SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH
              FROM information_schema.tables
              WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__"
                AND TABLE_NAME LIKE "RunDeepForcedSource%";
            +-----------------------------+----------------------------+------------+--------------+
            | TABLE_SCHEMA                | TABLE_NAME                 | TABLE_ROWS | DATA_LENGTH  |
            +-----------------------------+----------------------------+------------+--------------+
            | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource        | 1947408074 | 769226189230 |
            | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl | 1947408074 | 800384718414 |
            | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource        | 1950338024 | 770383519480 |
            | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl | 1950338024 | 801588927864 |
            | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource        | 1950357798 | 770391330210 |
            | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl | 1950357798 | 801597054978 |
            | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource        | 1897928494 | 749681755130 |
            | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl | 1897928494 | 780048611034 |
            | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource        | 1946839240 | 769001499800 |
            | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl | 1946839240 | 800150927640 |
            +-----------------------------+----------------------------+------------+--------------+
            

            Eliminating the original (non-extended) tables to free space for the next set of (de-duplicated) tables:

            DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource;
            DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource;
            DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource;
            DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource;
            DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource;
            

            Checking the free space:

            % df -h /data2
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vdd        7.8T  3.7T  3.8T  50% /data2
            This should be enough considering ~5% reduction of rows in the number of rows due to the de-duplication.
            

            De-duplicating IN2P3 tables

            GENERAL ALGORITHM: for every (5 in total) input table and the corresponding (same schema as the input one) output table copy all rows which are NOT linked to the previously identified objects from the NCSA's part of the overlap. The identifiers of those objects (to be excluded from being copied) are found in table:

            lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id
            

            Creating output tables:

            CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup
              LIKE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup
              LIKE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup
              LIKE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup
              LIKE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl;
             
            CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup
              LIKE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl;
            

            Launching the following 5 commands IN PARALLEL in the batch mode:

            INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup
              SELECT * FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id);
            

            INSERT INTO lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup
              SELECT * FROM lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id);
            

            INSERT INTO lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup
              SELECT * FROM lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id);
            

            INSERT INTO lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup
              SELECT * FROM lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id);
            

            INSERT INTO lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup
              SELECT * FROM lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl
                WHERE objectId NOT IN 
                  (SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id);
            

            Tested results by:

            SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH
              FROM information_schema.tables
              WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__"
                AND TABLE_NAME LIKE "RunDeepForcedSource%";
            +-----------------------------+----------------------------------+------------+--------------+
            | TABLE_SCHEMA                | TABLE_NAME                       | TABLE_ROWS | DATA_LENGTH  |
            +-----------------------------+----------------------------------+------------+--------------+
            | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl       | 1947408074 | 800384718414 |
            | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl_dedup | 1865204563 | 766599075393 |
            | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl       | 1950338024 | 801588927864 |
            | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl_dedup | 1867918914 | 767714673654 |
            | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl       | 1950357798 | 801597054978 |
            | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl_dedup | 1867935694 | 767721570234 |
            | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl       | 1897928494 | 780048611034 |
            | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl_dedup | 1817409697 | 746955385467 |
            | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl       | 1946839240 | 800150927640 |
            | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl_dedup | 1864588704 | 766345957344 |
            +-----------------------------+----------------------------------+------------+--------------+
            

            Show
            gapon Igor Gaponenko added a comment - - edited De-duplicating RunDeepForcedSource (similar to LSST ForcedSource ) Evaluation steps: study the procedure for filtering forced sources which was devised last time. prepare a seep-by-step plan for processing these tables. visit the status of the input tables ( RunDeepForcedSource ) to see which are available and if they’re healthy. eliminate tables left from the previous effort to free up the space evaluate space requirements and space availability (note each instance of the table will have two output tables: _dedup and _radecl Clearing unneeded or redundant tables ( NCSA ) This step allows to get extra space for merging the tables. USE daues_SDRP_dedupe_byfilter_4; SHOW TABLES; + ----------------------------------------+ | Tables_in_daues_SDRP_dedupe_byfilter_4 | + ----------------------------------------+ | RunDeepForcedSource | | RunDeepForcedSource_radecl | + ----------------------------------------+   SELECT COUNT (*) FROM RunDeepForcedSource_radecl; + ------------+ | COUNT (*) | + ------------+ | 1751879670 | + ------------+   SELECT COUNT (*) FROM RunDeepForcedSource; + ------------+ | COUNT (*) | + ------------+ | 1751879670 | + ------------+ OBSERVATION : The extended table _radecl has two extra columns needed by the partitioning tool. Otherwise this is the same original (un-modified) table because the way the previous merge attempt was done by giving a preference to the forced sources from NCSA witin the overlap area. So, we can eliminate those original un-extended tables from all 5 databases: DROP TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource; DROP TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource; DROP TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource; DROP TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource; DROP TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource; Clearing unneeded or redundant tables ( IN2P3 ) This step affects the followingdatabases: lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_z Inspecting tables in one of those to which what should be left: USE lsst_prod_dedupe_byfilter_u; SHOW TABLES; + ---------------------------------------+ | Tables_in_lsst_prod_dedupe_byfilter_u | + ---------------------------------------+ | RunDeepForcedSource | | RunDeepForcedSource_dedup | | RunDeepForcedSource_dedup_radecl | + ---------------------------------------+ SELECT COUNT (*) FROM RunDeepForcedSource; + ------------+ | COUNT (*) | + ------------+ | 1897928494 | + ------------+ SELECT COUNT (*) FROM RunDeepForcedSource_dedup; + ------------+ | COUNT (*) | + ------------+ | 1720938745 | + ------------+ SELECT COUNT (*) FROM RunDeepForcedSource_dedup_radecl; + ------------+ | COUNT (*) | + ------------+ | 1720938745 | + ------------+ OBSERVATION : The original table is the one which should be used for the merging effort and eliminate those duplicated and extended tables from all 5 databases: DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup; DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_dedup_radecl; DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup; DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_dedup_radecl; DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup; DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_dedup_radecl; DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup; DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_dedup_radecl; DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup; DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_dedup_radecl; Also eliminate 4 temporary tables: DROP TABLE lsst_prod_dedupe_byfilter_g.tmp_dedupe_g; DROP TABLE lsst_prod_dedupe_byfilter_r.tmp_dedupe_r; DROP TABLE lsst_prod_dedupe_byfilter_i.tmp_dedupe_i; DROP TABLE lsst_prod_dedupe_byfilter_z.tmp_dedupe_z; De-duplicating NCSA tables GENERAL ALGORITHM : for every (5 in total) input table and the corresponding (same schema as the input one) output one copy all rows which are NOT linked to the previously identified objects from the IN2P3 's part of the overlap . The identifiers of those objects are found in table: lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id This operation was launched for all 5 tables of 5 databases located at: % ls -al /data/mysql/db/ drwx------ 2 mysql mysql 4096 May 11 18:20 daues_SDRP_dedupe_byfilter_0 drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_1 drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_2 drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_3 drwx------ 2 mysql mysql 4096 May 11 18:26 daues_SDRP_dedupe_byfilter_4 Before the processing began the file system had more than enough of the free space to accommodate the new tables (3.5 TB is required): % df -h /data/mysql/db/ Filesystem Size Used Avail Use% Mounted on /dev/vdb 12T 5.0T 6.1T 45% /data Creating the output tables: CREATE TABLE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;   CREATE TABLE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup LIKE daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl;   CREATE TABLE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup LIKE daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl;   CREATE TABLE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup LIKE daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl;   CREATE TABLE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup LIKE daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl; Launching the following 5 commands IN PARALLEL in the batch mode: INSERT INTO daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl_dedup SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); INSERT INTO daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl_dedup SELECT * FROM daues_SDRP_dedupe_byfilter_1.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); INSERT INTO daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl_dedup SELECT * FROM daues_SDRP_dedupe_byfilter_2.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); INSERT INTO daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl_dedup SELECT * FROM daues_SDRP_dedupe_byfilter_3.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); INSERT INTO daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl_dedup SELECT * FROM daues_SDRP_dedupe_byfilter_4.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_in2p3_RunDeepSource_id); The log files for the operations are found at: % ls -al /home/gapon/logs/ -rw-rw-r-- 1 gapon gapon 0 May 11 18:28 daues_SDRP_dedupe_byfilter_0.log -rw-rw-r-- 1 gapon gapon 0 May 11 18:33 daues_SDRP_dedupe_byfilter_1.log -rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_2.log -rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_3.log -rw-rw-r-- 1 gapon gapon 0 May 11 18:34 daues_SDRP_dedupe_byfilter_4.log When the operation finished: % df -h /data/mysql/db/ Filesystem Size Used Avail Use% Mounted on /dev/vdb 12T 8.1T 3.1T 73% /data Running a quick test for each table: SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.tables WHERE TABLE_SCHEMA LIKE "daues_SDRP_dedupe_byfilter\__" ; + ------------------------------+----------------------------------+------------+--------------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH | + ------------------------------+----------------------------------+------------+--------------+ | daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl | 1729599791 | 710865514101 | | daues_SDRP_dedupe_byfilter_0 | RunDeepForcedSource_radecl_dedup | 1634106268 | 671617676148 | | daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl | 1752805399 | 720403018989 | | daues_SDRP_dedupe_byfilter_1 | RunDeepForcedSource_radecl_dedup | 1654729991 | 680094026301 | | daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl | 1754290686 | 721013471946 | | daues_SDRP_dedupe_byfilter_2 | RunDeepForcedSource_radecl_dedup | 1656203271 | 680699544381 | | daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl | 1754272841 | 721006137651 | | daues_SDRP_dedupe_byfilter_3 | RunDeepForcedSource_radecl_dedup | 1656209922 | 680702277942 | | daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl | 1751879670 | 720022544370 | | daues_SDRP_dedupe_byfilter_4 | RunDeepForcedSource_radecl_dedup | 1654034432 | 679808151552 | + ------------------------------+----------------------------------+------------+--------------+ Extending IN2P3 tables IMPORTANT : A goal of tis step is to add two extra columns which have spherical coordinates of the corresponding objects. This will be needed to correctly partition forced sources (for a variety of technical reasons, the current implementation of the partitioning algorithm is NOT using the explicit association between forced-sources and the corresponding objects and is relying upon an implicit spatial association). This operation was launched for 5 tables of 5 databases located at: % ls -al /data/mysql/db/ lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_g -> /data2/mysql/db/lsst_prod_dedupe_byfilter_g lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_i -> /data2/mysql/db/lsst_prod_dedupe_byfilter_i lrwxrwxrwx 1 root root 43 Sep 9 2016 lsst_prod_dedupe_byfilter_r -> /data2/mysql/db/lsst_prod_dedupe_byfilter_r lrwxrwxrwx 1 root root 43 May 12 06:08 lsst_prod_dedupe_byfilter_u -> /data2/mysql/db/lsst_prod_dedupe_byfilter_u lrwxrwxrwx 1 root root 43 May 12 20:54 lsst_prod_dedupe_byfilter_z -> /data2/mysql/db/lsst_prod_dedupe_byfilter_z The file system has more than enough of the free space to accommodate the new tables (3.5 TB is required): % df -h /data2/mysql/db/ Filesystem Size Used Avail Use% Mounted on /dev/vdd 7.8T 3.7T 3.8T 49% /data2 Creating the output tables: CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource_radecl; The GENERAL ALGORITHM relies upon a small table produced earlier in a context of the original effort. This table is a subset of the full RunDeepSource table which only has a triplet of ( id , ra , decl ) for each object. SELECT COUNT (*) FROM lsst_prod_DC_2013_2.RunDeepSource_radecl; + ----------+ | COUNT (*) | + ----------+ | 98088148 | + ----------+ The following operation is in progress for all 5 filter bands: INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl SELECT rdfs.*,rdss.coord_ra,rdss.coord_decl FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource `rdfs` STRAIGHT_JOIN lsst_prod_DC_2013_2.RunDeepSource_radecl `rdss` ON rdfs.objectId = rdss.id; The log files for the operations are found at: % ls -al /home/gapon/logs/ -rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_g_radecl.log -rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_i_radecl.log -rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_r_radecl.log -rw-rw-r-- 1 gapon gapon 0 May 12 21:43 lsst_prod_dedupe_byfilter_u_radecl.log -rw-rw-r-- 1 gapon gapon 0 May 12 21:44 lsst_prod_dedupe_byfilter_z_radecl.log Tested results by: SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.tables WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__" AND TABLE_NAME LIKE "RunDeepForcedSource%" ; + -----------------------------+----------------------------+------------+--------------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH | + -----------------------------+----------------------------+------------+--------------+ | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource | 1947408074 | 769226189230 | | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl | 1947408074 | 800384718414 | | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource | 1950338024 | 770383519480 | | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl | 1950338024 | 801588927864 | | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource | 1950357798 | 770391330210 | | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl | 1950357798 | 801597054978 | | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource | 1897928494 | 749681755130 | | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl | 1897928494 | 780048611034 | | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource | 1946839240 | 769001499800 | | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl | 1946839240 | 800150927640 | + -----------------------------+----------------------------+------------+--------------+ Eliminating the original (non-extended) tables to free space for the next set of (de-duplicated) tables: DROP TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource; DROP TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource; DROP TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource; DROP TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource; DROP TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource; Checking the free space: % df -h /data2 Filesystem Size Used Avail Use% Mounted on /dev/vdd 7.8T 3.7T 3.8T 50% /data2 This should be enough considering ~5% reduction of rows in the number of rows due to the de-duplication. De-duplicating IN2P3 tables GENERAL ALGORITHM : for every (5 in total) input table and the corresponding (same schema as the input one) output table copy all rows which are NOT linked to the previously identified objects from the NCSA 's part of the overlap . The identifiers of those objects (to be excluded from being copied) are found in table: lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id Creating output tables: CREATE TABLE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup LIKE lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup LIKE lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup LIKE lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup LIKE lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl;   CREATE TABLE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup LIKE lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl; Launching the following 5 commands IN PARALLEL in the batch mode: INSERT INTO lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl_dedup SELECT * FROM lsst_prod_dedupe_byfilter_u.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); INSERT INTO lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl_dedup SELECT * FROM lsst_prod_dedupe_byfilter_g.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); INSERT INTO lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl_dedup SELECT * FROM lsst_prod_dedupe_byfilter_r.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); INSERT INTO lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl_dedup SELECT * FROM lsst_prod_dedupe_byfilter_i.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); INSERT INTO lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl_dedup SELECT * FROM lsst_prod_dedupe_byfilter_z.RunDeepForcedSource_radecl WHERE objectId NOT IN ( SELECT id FROM lsst_sdss_stripe82_01.overlap_ncsa_RunDeepSource_id); Tested results by: SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.tables WHERE TABLE_SCHEMA LIKE "lsst_prod_dedupe_byfilter\__" AND TABLE_NAME LIKE "RunDeepForcedSource%" ; + -----------------------------+----------------------------------+------------+--------------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS | DATA_LENGTH | + -----------------------------+----------------------------------+------------+--------------+ | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl | 1947408074 | 800384718414 | | lsst_prod_dedupe_byfilter_g | RunDeepForcedSource_radecl_dedup | 1865204563 | 766599075393 | | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl | 1950338024 | 801588927864 | | lsst_prod_dedupe_byfilter_i | RunDeepForcedSource_radecl_dedup | 1867918914 | 767714673654 | | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl | 1950357798 | 801597054978 | | lsst_prod_dedupe_byfilter_r | RunDeepForcedSource_radecl_dedup | 1867935694 | 767721570234 | | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl | 1897928494 | 780048611034 | | lsst_prod_dedupe_byfilter_u | RunDeepForcedSource_radecl_dedup | 1817409697 | 746955385467 | | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl | 1946839240 | 800150927640 | | lsst_prod_dedupe_byfilter_z | RunDeepForcedSource_radecl_dedup | 1864588704 | 766345957344 | + -----------------------------+----------------------------------+------------+--------------+
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Dumping databases into the TSV format

            DeepCoadd (and 2 deendent tables), RunDeepSource

            Prepare the destination folder:

            {code;bash}

            % sudo mkdir -p /data1/tsv/lsst_sdss_stripe82_01
            % sudo chown mysql:mysql /data1/tsv/lsst_sdss_stripe82_01
            % sudo chmod 0777 /data1/tsv/lsst_sdss_stripe82_01

            Started the dump with:
            {code:bash}
            % nohup mysqldump -u root lsst_sdss_stripe82_01 \
              DeepCoadd_merge \
              DeepCoadd_Metadata_merge \
              DeepCoadd_To_Htm10_merge \
              RunDeepSource_merge \
              -T/data1/tsv/lsst_sdss_stripe82_01/ &>logs/lsst_sdss_stripe82_01.tsv.log &
            

            The operation successfully finished. NO problems in the log file. The file sizes are shown below:

            % ls -al /data1/tsv/lsst_sdss_stripe82_01/
            total 211984396
            drwxrwxrwx 2 mysql mysql         4096 May 16 21:07 .
            drwxrwxrwx 9 mysql mysql         4096 May 16 02:24 ..
            -rw-rw-r-- 1 gapon gapon         2549 May 16 02:29 DeepCoadd_merge.sql
            -rw-rw-rw- 1 mysql mysql     12319362 May 16 02:29 DeepCoadd_merge.txt
            -rw-rw-r-- 1 gapon gapon         1669 May 16 02:29 DeepCoadd_Metadata_merge.sql
            -rw-rw-rw- 1 mysql mysql    279986169 May 16 02:29 DeepCoadd_Metadata_merge.txt
            -rw-rw-r-- 1 gapon gapon         1495 May 16 02:29 DeepCoadd_To_Htm10_merge.sql
            -rw-rw-rw- 1 mysql mysql     11462430 May 16 02:29 DeepCoadd_To_Htm10_merge.txt
            -rw-rw-r-- 1 gapon gapon         8189 May 16 02:29 RunDeepSource_merge.sql
            -rw-rw-rw- 1 mysql mysql 216768195814 May 16 07:05 RunDeepSource_merge.txt
            

            RunDeepForcedSource of NCSA

            Based on the prior effort each TSV file would have an approximate size of 1.2 TB. Hence the storage requirement for all 5 tables would be 6 TB. The following file system should provide enough capacity to accommodate these files:

            % df -h /data1
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vdc        7.8T   93M  7.4T   1% /data1
             
            % for band in "0" "1" "2" "3" "4"; do \
              sudo mkdir -p /data1/tsv/daues_SDRP_dedupe_byfilter_${band} \
            done
             
            % sudo chown -R mysql:mysql /data1/tsv
            % sudo chmod -R 0777 /data1
            

            Starting all dumps in the batch mode:

            % for band in "0" "1" "2" "3" "4"; do \
              nohup mysqldump -u root \
                daues_SDRP_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \
                -T/data1/tsv/daues_SDRP_dedupe_byfilter_${band}/ \
                >& /home/gapon/logs/daues_SDRP_dedupe_byfilter_${band}.tsv.log & \
            done
            

            The operation finished with the following file sizes:

            ls -al /data1/tsv/daues_SDRP_dedupe_byfilter_*/*.txt
            -rw-rw-rw- 1 mysql mysql 1161290588478 May 14 16:19 /data1/tsv/daues_SDRP_dedupe_byfilter_0/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1207653978772 May 14 17:37 /data1/tsv/daues_SDRP_dedupe_byfilter_1/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1220211612878 May 14 17:51 /data1/tsv/daues_SDRP_dedupe_byfilter_2/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1227132706017 May 14 17:59 /data1/tsv/daues_SDRP_dedupe_byfilter_3/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1198155639140 May 14 18:02 /data1/tsv/daues_SDRP_dedupe_byfilter_4/RunDeepForcedSource_radecl_dedup.txt
            

            RunDeepForcedSource of IN2P3

            Based on the prior effort each TSV file would have an approximate size of 1.2 TB. Hence the storage requirement for all 5 tables would be 6 TB. The following file system should provide enough capacity to accommodate these files:

            % df -h /data3/tsv
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/vde        7.8T  201G  7.2T   3% /data3
             
            % for band in "u" "g" "r" "i" "z"; do \
              sudo mkdir -p /data3/tsv/lsst_prod_dedupe_byfilter_${band} \
            done
             
            % sudo chown -R mysql:mysql /data3/tsv
            % sudo chmod -R 0777 /data3/tsv
            

            Starting all dumps in the batch mode:

            % for band in "u" "g" "r" "i" "z"; do \
              nohup mysqldump -u root \
                lsst_prod_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \
                -T/data3/tsv/lsst_prod_dedupe_byfilter_${band}/ \
                >& /home/gapon/logs/lsst_prod_dedupe_byfilter_${band}.tsv.log & \
            done
            

            The operation finished with the following file sizes:

            % ls -al /data3/tsv/lsst_prod_dedupe_byfilter_*/*.txt
            -rw-rw-rw- 1 mysql mysql 1356553484909 May 18 05:49 /data3/tsv/lsst_prod_dedupe_byfilter_g/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1375324194874 May 18 06:07 /data3/tsv/lsst_prod_dedupe_byfilter_i/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1369286130161 May 18 06:00 /data3/tsv/lsst_prod_dedupe_byfilter_r/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1287603053848 May 18 04:36 /data3/tsv/lsst_prod_dedupe_byfilter_u/RunDeepForcedSource_radecl_dedup.txt
            -rw-rw-rw- 1 mysql mysql 1341697675866 May 18 05:35 /data3/tsv/lsst_prod_dedupe_byfilter_z/RunDeepForcedSource_radecl_dedup.txt
            

            Show
            gapon Igor Gaponenko added a comment - - edited Dumping databases into the TSV format DeepCoadd (and 2 deendent tables), RunDeepSource Prepare the destination folder: {code;bash} % sudo mkdir -p /data1/tsv/lsst_sdss_stripe82_01 % sudo chown mysql:mysql /data1/tsv/lsst_sdss_stripe82_01 % sudo chmod 0777 /data1/tsv/lsst_sdss_stripe82_01 Started the dump with: {code:bash} % nohup mysqldump -u root lsst_sdss_stripe82_01 \ DeepCoadd_merge \ DeepCoadd_Metadata_merge \ DeepCoadd_To_Htm10_merge \ RunDeepSource_merge \ -T/data1/tsv/lsst_sdss_stripe82_01/ &>logs/lsst_sdss_stripe82_01.tsv.log & The operation successfully finished. NO problems in the log file. The file sizes are shown below: % ls -al /data1/tsv/lsst_sdss_stripe82_01/ total 211984396 drwxrwxrwx 2 mysql mysql 4096 May 16 21:07 . drwxrwxrwx 9 mysql mysql 4096 May 16 02:24 .. -rw-rw-r-- 1 gapon gapon 2549 May 16 02:29 DeepCoadd_merge.sql -rw-rw-rw- 1 mysql mysql 12319362 May 16 02:29 DeepCoadd_merge.txt -rw-rw-r-- 1 gapon gapon 1669 May 16 02:29 DeepCoadd_Metadata_merge.sql -rw-rw-rw- 1 mysql mysql 279986169 May 16 02:29 DeepCoadd_Metadata_merge.txt -rw-rw-r-- 1 gapon gapon 1495 May 16 02:29 DeepCoadd_To_Htm10_merge.sql -rw-rw-rw- 1 mysql mysql 11462430 May 16 02:29 DeepCoadd_To_Htm10_merge.txt -rw-rw-r-- 1 gapon gapon 8189 May 16 02:29 RunDeepSource_merge.sql -rw-rw-rw- 1 mysql mysql 216768195814 May 16 07:05 RunDeepSource_merge.txt RunDeepForcedSource of NCSA Based on the prior effort each TSV file would have an approximate size of 1.2 TB . Hence the storage requirement for all 5 tables would be 6 TB . The following file system should provide enough capacity to accommodate these files: % df -h /data1 Filesystem Size Used Avail Use% Mounted on /dev/vdc 7.8T 93M 7.4T 1% /data1   % for band in "0" "1" "2" "3" "4" ; do \ sudo mkdir -p /data1/tsv/daues_SDRP_dedupe_byfilter_ ${band} \ done   % sudo chown -R mysql:mysql /data1/tsv % sudo chmod -R 0777 /data1 Starting all dumps in the batch mode: % for band in "0" "1" "2" "3" "4" ; do \ nohup mysqldump -u root \ daues_SDRP_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \ -T /data1/tsv/daues_SDRP_dedupe_byfilter_ ${band}/ \ >& /home/gapon/logs/daues_SDRP_dedupe_byfilter_ ${band}.tsv.log & \ done The operation finished with the following file sizes: ls -al /data1/tsv/daues_SDRP_dedupe_byfilter_ */*.txt -rw-rw-rw- 1 mysql mysql 1161290588478 May 14 16:19 /data1/tsv/daues_SDRP_dedupe_byfilter_0/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1207653978772 May 14 17:37 /data1/tsv/daues_SDRP_dedupe_byfilter_1/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1220211612878 May 14 17:51 /data1/tsv/daues_SDRP_dedupe_byfilter_2/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1227132706017 May 14 17:59 /data1/tsv/daues_SDRP_dedupe_byfilter_3/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1198155639140 May 14 18:02 /data1/tsv/daues_SDRP_dedupe_byfilter_4/RunDeepForcedSource_radecl_dedup .txt RunDeepForcedSource of IN2P3 Based on the prior effort each TSV file would have an approximate size of 1.2 TB . Hence the storage requirement for all 5 tables would be 6 TB . The following file system should provide enough capacity to accommodate these files: % df -h /data3/tsv Filesystem Size Used Avail Use% Mounted on /dev/vde 7.8T 201G 7.2T 3% /data3   % for band in "u" "g" "r" "i" "z" ; do \ sudo mkdir -p /data3/tsv/lsst_prod_dedupe_byfilter_ ${band} \ done   % sudo chown -R mysql:mysql /data3/tsv % sudo chmod -R 0777 /data3/tsv Starting all dumps in the batch mode: % for band in "u" "g" "r" "i" "z" ; do \ nohup mysqldump -u root \ lsst_prod_dedupe_byfilter_${band} RunDeepForcedSource_radecl_dedup \ -T /data3/tsv/lsst_prod_dedupe_byfilter_ ${band}/ \ >& /home/gapon/logs/lsst_prod_dedupe_byfilter_ ${band}.tsv.log & \ done The operation finished with the following file sizes: % ls -al /data3/tsv/lsst_prod_dedupe_byfilter_ */*.txt -rw-rw-rw- 1 mysql mysql 1356553484909 May 18 05:49 /data3/tsv/lsst_prod_dedupe_byfilter_g/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1375324194874 May 18 06:07 /data3/tsv/lsst_prod_dedupe_byfilter_i/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1369286130161 May 18 06:00 /data3/tsv/lsst_prod_dedupe_byfilter_r/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1287603053848 May 18 04:36 /data3/tsv/lsst_prod_dedupe_byfilter_u/RunDeepForcedSource_radecl_dedup .txt -rw-rw-rw- 1 mysql mysql 1341697675866 May 18 05:35 /data3/tsv/lsst_prod_dedupe_byfilter_z/RunDeepForcedSource_radecl_dedup .txt
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Migrating TSV files from OpenStack to lsst-dev

            Set up SSH configuration to simplify file transfers:

            % cat cat ~/.ssh/config 
            Host 141.142.211.152
                IdentityFile ~/.ssh/id_ncsa_passwordless_rsa
            

            The destination folder for the files will be:

            /datasets/gapon/catalogs_01/
            

            All transfers will be done from lsst-xfer using rsync.

            RunDeepForcedSource of NCSA

            The destination folders:

            % ls -alh /datasets/gapon/catalogs_01/
            total 0
            drwxr-xr-x  7 gapon grp_202 4.0K May 14 19:05 .
            drwxr-xr-x 15 gapon root    4.0K May 14 18:40 ..
            drwxr-xr-x  2 gapon grp_202 4.0K May 14 19:03 daues_SDRP_dedupe_byfilter_0
            drwxr-xr-x  2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_1
            drwxr-xr-x  2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_2
            drwxr-xr-x  2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_3
            drwxr-xr-x  2 gapon grp_202 4.0K May 14 19:05 daues_SDRP_dedupe_byfilter_4
            

            Launched 5 rsync on node lsst-xfer:

            % cd /datasets/gapon/catalogs_01/
            % for band in "0" "1" "2" "3" "4"; do \
              nohup rsync -hrv \
                141.142.211.152:/data1/tsv/daues_SDRP_dedupe_byfilter_${band} . \
                >& daues_SDRP_dedupe_byfilter_${band}.log& \
            done
            

            NOTE: the log files were put onto the destination folders

            The operation has finished. Checking file sizes:

            % ls -al /datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_* | grep txt
            -rw-r--r-- 1 gapon grp_202 1161290588478 May 15 21:18 RunDeepForcedSource_radecl_dedup.txt
            -rw-r--r-- 1 gapon grp_202 1207653978772 May 15 20:32 RunDeepForcedSource_radecl_dedup.txt
            -rw-r--r-- 1 gapon grp_202 1220211612878 May 15 20:36 RunDeepForcedSource_radecl_dedup.txt
            -rw-r--r-- 1 gapon grp_202 1227132706017 May 15 21:04 RunDeepForcedSource_radecl_dedup.txt
            -rw-r--r-- 1 gapon grp_202 1198155639140 May 15 20:41 RunDeepForcedSource_radecl_dedup.txt
            

            RunDeepForcedSource of IN2P3

            The destination folder for the files is:

            % ls -alh /datasets/gapon/catalogs_01/
            

            Launched 5 rsync on node lsst-xfer:

            % cd /datasets/gapon/catalogs_01/
            % for band in "u" "g" "r" "i" "z"; do \
              nohup rsync -hrv \
                141.142.211.152:/data3/tsv/lsst_prod_dedupe_byfilter_${band} . \
                >& lsst_prod_dedupe_byfilter_${band}.log& \
            done
            

            NOTE: the log files were put onto the destination folders

            This operation is IN-PROGRESS...

            The operation has finished. Checking file sizes:

            % ls -al /datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_* | grep txt
            

            DeepCoadd (and 2 metadata tables) and RunDeepSource

            Transferred 4 tables to the destination folder:

            % cd /datasets/gapon/catalogs_01
            % nohup rsync -hrv \
            141.142.211.152:/data1/tsv/lsst_sdss_stripe82_01 . \
            >& lsst_sdss_stripe82_01.log
            % lls -al /datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/
            total 211984768
            drwxr-xr-x 2 gapon grp_202         4096 May 16 16:43 .
            drwxr-xr-x 8 gapon grp_202         4096 May 16 16:08 ..
            -rw-r--r-- 1 gapon grp_202         2549 May 16 16:08 DeepCoadd_merge.sql
            -rw-r--r-- 1 gapon grp_202     12319362 May 16 16:08 DeepCoadd_merge.txt
            -rw-r--r-- 1 gapon grp_202         1669 May 16 16:08 DeepCoadd_Metadata_merge.sql
            -rw-r--r-- 1 gapon grp_202    279986169 May 16 16:08 DeepCoadd_Metadata_merge.txt
            -rw-r--r-- 1 gapon grp_202         1495 May 16 16:08 DeepCoadd_To_Htm10_merge.sql
            -rw-r--r-- 1 gapon grp_202     11462430 May 16 16:08 DeepCoadd_To_Htm10_merge.txt
            -rw-r--r-- 1 gapon grp_202         8189 May 16 16:08 RunDeepSource_merge.sql
            -rw-r--r-- 1 gapon grp_202 216768195814 May 16 16:43 RunDeepSource_merge.txt
            

            Show
            gapon Igor Gaponenko added a comment - - edited Migrating TSV files from OpenStack to lsst-dev Set up SSH configuration to simplify file transfers: % cat cat ~/. ssh /config Host 141.142.211.152 IdentityFile ~/. ssh /id_ncsa_passwordless_rsa The destination folder for the files will be: /datasets/gapon/catalogs_01/ All transfers will be done from lsst-xfer using rsync . RunDeepForcedSource of NCSA The destination folders: % ls -alh /datasets/gapon/catalogs_01/ total 0 drwxr-xr-x 7 gapon grp_202 4.0K May 14 19:05 . drwxr-xr-x 15 gapon root 4.0K May 14 18:40 .. drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:03 daues_SDRP_dedupe_byfilter_0 drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_1 drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_2 drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:04 daues_SDRP_dedupe_byfilter_3 drwxr-xr-x 2 gapon grp_202 4.0K May 14 19:05 daues_SDRP_dedupe_byfilter_4 Launched 5 rsync on node lsst-xfer : % cd /datasets/gapon/catalogs_01/ % for band in "0" "1" "2" "3" "4" ; do \ nohup rsync -hrv \ 141.142.211.152: /data1/tsv/daues_SDRP_dedupe_byfilter_ ${band} . \ >& daues_SDRP_dedupe_byfilter_${band}.log& \ done NOTE : the log files were put onto the destination folders The operation has finished. Checking file sizes: % ls -al /datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_ * | grep txt -rw-r--r-- 1 gapon grp_202 1161290588478 May 15 21:18 RunDeepForcedSource_radecl_dedup.txt -rw-r--r-- 1 gapon grp_202 1207653978772 May 15 20:32 RunDeepForcedSource_radecl_dedup.txt -rw-r--r-- 1 gapon grp_202 1220211612878 May 15 20:36 RunDeepForcedSource_radecl_dedup.txt -rw-r--r-- 1 gapon grp_202 1227132706017 May 15 21:04 RunDeepForcedSource_radecl_dedup.txt -rw-r--r-- 1 gapon grp_202 1198155639140 May 15 20:41 RunDeepForcedSource_radecl_dedup.txt RunDeepForcedSource of IN2P3 The destination folder for the files is: % ls -alh /datasets/gapon/catalogs_01/ Launched 5 rsync on node lsst-xfer : % cd /datasets/gapon/catalogs_01/ % for band in "u" "g" "r" "i" "z" ; do \ nohup rsync -hrv \ 141.142.211.152: /data3/tsv/lsst_prod_dedupe_byfilter_ ${band} . \ >& lsst_prod_dedupe_byfilter_${band}.log& \ done NOTE : the log files were put onto the destination folders This operation is IN-PROGRESS ... The operation has finished. Checking file sizes: % ls -al /datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_ * | grep txt DeepCoadd (and 2 metadata tables) and RunDeepSource Transferred 4 tables to the destination folder: % cd /datasets/gapon/catalogs_01 % nohup rsync -hrv \ 141.142.211.152: /data1/tsv/lsst_sdss_stripe82_01 . \ >& lsst_sdss_stripe82_01.log % lls -al /datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/ total 211984768 drwxr-xr-x 2 gapon grp_202 4096 May 16 16:43 . drwxr-xr-x 8 gapon grp_202 4096 May 16 16:08 .. -rw-r--r-- 1 gapon grp_202 2549 May 16 16:08 DeepCoadd_merge.sql -rw-r--r-- 1 gapon grp_202 12319362 May 16 16:08 DeepCoadd_merge.txt -rw-r--r-- 1 gapon grp_202 1669 May 16 16:08 DeepCoadd_Metadata_merge.sql -rw-r--r-- 1 gapon grp_202 279986169 May 16 16:08 DeepCoadd_Metadata_merge.txt -rw-r--r-- 1 gapon grp_202 1495 May 16 16:08 DeepCoadd_To_Htm10_merge.sql -rw-r--r-- 1 gapon grp_202 11462430 May 16 16:08 DeepCoadd_To_Htm10_merge.txt -rw-r--r-- 1 gapon grp_202 8189 May 16 16:08 RunDeepSource_merge.sql -rw-r--r-- 1 gapon grp_202 216768195814 May 16 16:43 RunDeepSource_merge.txt
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Partitioning RunDeepSource

            Created a work area where the resulting chunks will be located. The same folder will also host the configuration and log files:

            /datasets/gapon/catalogs_01_partitioned
            

            The table-specific partitions will e placed at subfolder:

            RunDeepSource/
            

            The most important configuration parameters are shown below:

            % cat RunDeepSource.cfg
            ..
            # Partitioning parameters.
            part = {
                # Common partitioning parameters.
                num-stripes     = 340
                num-sub-stripes = 12
                chunk           = chunkId
                sub-chunk       = subChunkId
             
                # The partitioning position is the object's point-source model position.
                pos = 'coord_ra, coord_decl'
                # Overlap radius in degrees.
                overlap = 0.01667
            }
            ..
            

            IMPORTANT: these are the same parameters as for the presently deployed (in PDAC) database sdss_stripe82_00.

            Launched the partitioning tool on lsst-dev:

            % cd /datasets/gapon/catalogs_01_partitioned
            % source /datasets/gapon/stack/loadLSST.bash 
            % setup -t qserv-dev qserv_distrib
            % which sph-partition
            /datasets/gapon/stack/Linux64/partition/12.0+9/bin/sph-partition
            % nohup sph-partition \
              --verbose -c RunDeepSource.cfg \
              --out.dir=RunDeepSource/ \
              --in=/datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/RunDeepSource_merge.txt \
              >& logs/RunDeepSource.log&
            

            When the operation finished the output folder had:

            % du -hs RunDeepSource/
            629G	RunDeepSource/
            % ls -al  RunDeepSource/*.txt | wc -l
            2160
            % ls -al  RunDeepSource/*_overlap.txt | wc -l
            1080
            

            CONCLUSION: this is perfectly consistent with the presently loaded initial version of the database.

            Show
            gapon Igor Gaponenko added a comment - - edited Partitioning RunDeepSource Created a work area where the resulting chunks will be located. The same folder will also host the configuration and log files: /datasets/gapon/catalogs_01_partitioned The table-specific partitions will e placed at subfolder: RunDeepSource/ The most important configuration parameters are shown below: % cat RunDeepSource.cfg .. # Partitioning parameters. part = { # Common partitioning parameters. num-stripes = 340 num-sub-stripes = 12 chunk = chunkId sub-chunk = subChunkId   # The partitioning position is the object's point-source model position. pos = 'coord_ra, coord_decl' # Overlap radius in degrees. overlap = 0.01667 } .. IMPORTANT : these are the same parameters as for the presently deployed (in PDAC ) database sdss_stripe82_00 . Launched the partitioning tool on lsst-dev : % cd /datasets/gapon/catalogs_01_partitioned % source /datasets/gapon/stack/loadLSST . bash % setup -t qserv-dev qserv_distrib % which sph-partition /datasets/gapon/stack/Linux64/partition/12 .0+9 /bin/sph-partition % nohup sph-partition \ --verbose -c RunDeepSource.cfg \ --out. dir =RunDeepSource/ \ -- in = /datasets/gapon/catalogs_01/lsst_sdss_stripe82_01/RunDeepSource_merge .txt \ >& logs /RunDeepSource .log& When the operation finished the output folder had: % du -hs RunDeepSource/ 629G RunDeepSource/ % ls -al RunDeepSource/*.txt | wc -l 2160 % ls -al RunDeepSource/*_overlap.txt | wc -l 1080 CONCLUSION : this is perfectly consistent with the presently loaded initial version of the database.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Partitioning RunDeepForcedSource

            Using the same work area for the resulting chunks as in case of table RunDeepSource:

            /datasets/gapon/catalogs_01_partitioned
            

            The table-specific partitions will be placed at subfolders:

            RunDeepForcedSource/<database>
            

            Where the <database> parameter represents 10 original databases:

            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_z
            

            The most important configuration parameters are shown below:

            % cat RunDeepForcedSource.cfg
            ..
            # Partitioning parameters.
            part = {
                # Common partitioning parameters.
                num-stripes     = 340
                num-sub-stripes = 12
                chunk           = chunkId
                sub-chunk       = subChunkId
             
                # The partitioning position is the object's point-source model position.
                pos = 'object_coord_ra, object_coord_decl'
                # Overlap radius in degrees.
                overlap = 0
            }
            ..
            

            IMPORTANT: these are the same parameters as for the presently deployed (in PDAC) database sdss_stripe82_00.

            Launching the same partitioning tool on lsst-dev:

            % cd /datasets/gapon/catalogs_01_partitioned
            % source /datasets/gapon/stack/loadLSST.bash 
            % setup -t qserv-dev qserv_distrib
            % which sph-partition
            /datasets/gapon/stack/Linux64/partition/12.0+9/bin/sph-partition
            

            NCSA

            Note that each database will have its own collection of (partial) chunks:

            for band in "0" "1" "2" "3" "4"; do \
              mkdir -p RunDeepForcedSource/daues_SDRP_dedupe_byfilter_${band}; \
              nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg  \
                --out.dir=RunDeepForcedSource/daues_SDRP_dedupe_byfilter_${band}/ \
                --in=/datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_${band}/RunDeepForcedSource_radecl_dedup.txt \
                >& logs/daues_SDRP_dedupe_byfilter_${band}.log& \
            done
            

            When the operation will finish check the output folders:

            % du -hs RunDeepForcedSource/*
            1.1T	RunDeepForcedSource/daues_SDRP_dedupe_byfilter_0
            1.2T	RunDeepForcedSource/daues_SDRP_dedupe_byfilter_1
            1.2T	RunDeepForcedSource/daues_SDRP_dedupe_byfilter_2
            1.2T	RunDeepForcedSource/daues_SDRP_dedupe_byfilter_3
            1.2T	RunDeepForcedSource/daues_SDRP_dedupe_byfilter_4
            % ls -1  RunDeepForcedSource/daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l
            548
            

            IN2P3

            Note that each database will have its own collection of (partial) chunks:

            for band in "u" "g" "r" "i" "z"; do \
              mkdir -p RunDeepForcedSource/lsst_prod_dedupe_byfilter_${band}; \
              nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg  \
                --out.dir=RunDeepForcedSource/lsst_prod_dedupe_byfilter_${band}/ \
                --in=/datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_${band}/RunDeepForcedSource_radecl_dedup.txt \
                >& logs/lsst_prod_dedupe_byfilter_${band}.log& \
            done
            

            This operation is IN-PROGRESS...

            When the operation will finish check the output folders:

            % du -hs RunDeepForcedSource/*
            1.1T    RunDeepForcedSource/daues_SDRP_dedupe_byfilter_0
            1.2T    RunDeepForcedSource/daues_SDRP_dedupe_byfilter_1
            1.2T    RunDeepForcedSource/daues_SDRP_dedupe_byfilter_2
            1.2T    RunDeepForcedSource/daues_SDRP_dedupe_byfilter_3
            1.2T    RunDeepForcedSource/daues_SDRP_dedupe_byfilter_4
            1.3T    RunDeepForcedSource/lsst_prod_dedupe_byfilter_g
            1.3T    RunDeepForcedSource/lsst_prod_dedupe_byfilter_i
            1.3T    RunDeepForcedSource/lsst_prod_dedupe_byfilter_r
            1.2T    RunDeepForcedSource/lsst_prod_dedupe_byfilter_u
            1.3T    RunDeepForcedSource/lsst_prod_dedupe_byfilter_z
            

            % ls -1  RunDeepForcedSource/*/ | grep .txt | sort -u | wc -l
            1080
            % ls -1  RunDeepForcedSource/daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l
            548
            % ls -1  RunDeepForcedSource/lsst_prod_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l
            555
            

            These numbers could also be compared with the previously (incorrectly) merged version of the catalog (where there was teh same number of unique chunks - 1080):

            % ls -1  ../catalogs_partitioned/out_daues_SDRP_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l
            570
            % ls -1  ../catalogs_partitioned/out_lsst_prod_dedupe_byfilter_*/ | grep .txt | sort -u | wc -l
            517
            

            CONCLUSION: this observation is perfectly consistent with expectations as we should see more equal distribution of partitions between datasets with a slightly higher number of chunks in the IN2P3 area of the new catalog (see the description section of this ticket for more details).

            Show
            gapon Igor Gaponenko added a comment - - edited Partitioning RunDeepForcedSource Using the same work area for the resulting chunks as in case of table RunDeepSource : /datasets/gapon/catalogs_01_partitioned The table-specific partitions will be placed at subfolders: RunDeepForcedSource/<database> Where the <database> parameter represents 10 original databases: daues_SDRP_dedupe_byfilter_0 daues_SDRP_dedupe_byfilter_1 daues_SDRP_dedupe_byfilter_2 daues_SDRP_dedupe_byfilter_3 daues_SDRP_dedupe_byfilter_4 lsst_prod_dedupe_byfilter_u lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_z The most important configuration parameters are shown below: % cat RunDeepForcedSource.cfg .. # Partitioning parameters. part = { # Common partitioning parameters. num-stripes = 340 num-sub-stripes = 12 chunk = chunkId sub-chunk = subChunkId   # The partitioning position is the object's point-source model position. pos = 'object_coord_ra, object_coord_decl' # Overlap radius in degrees. overlap = 0 } .. IMPORTANT : these are the same parameters as for the presently deployed (in PDAC ) database sdss_stripe82_00 . Launching the same partitioning tool on lsst-dev : % cd /datasets/gapon/catalogs_01_partitioned % source /datasets/gapon/stack/loadLSST . bash % setup -t qserv-dev qserv_distrib % which sph-partition /datasets/gapon/stack/Linux64/partition/12 .0+9 /bin/sph-partition NCSA Note that each database will have its own collection of (partial) chunks: for band in "0" "1" "2" "3" "4" ; do \ mkdir -p RunDeepForcedSource /daues_SDRP_dedupe_byfilter_ ${band}; \ nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg \ --out. dir =RunDeepForcedSource /daues_SDRP_dedupe_byfilter_ ${band}/ \ -- in = /datasets/gapon/catalogs_01/daues_SDRP_dedupe_byfilter_ ${band} /RunDeepForcedSource_radecl_dedup .txt \ >& logs /daues_SDRP_dedupe_byfilter_ ${band}.log& \ done When the operation will finish check the output folders: % du -hs RunDeepForcedSource/* 1.1T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_0 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_1 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_2 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_3 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_4 % ls -1 RunDeepForcedSource /daues_SDRP_dedupe_byfilter_ */ | grep .txt | sort -u | wc -l 548 IN2P3 Note that each database will have its own collection of (partial) chunks: for band in "u" "g" "r" "i" "z" ; do \ mkdir -p RunDeepForcedSource /lsst_prod_dedupe_byfilter_ ${band}; \ nohup srun sph-partition --verbose -c RunDeepForcedSource.cfg \ --out. dir =RunDeepForcedSource /lsst_prod_dedupe_byfilter_ ${band}/ \ -- in = /datasets/gapon/catalogs_01/lsst_prod_dedupe_byfilter_ ${band} /RunDeepForcedSource_radecl_dedup .txt \ >& logs /lsst_prod_dedupe_byfilter_ ${band}.log& \ done This operation is IN-PROGRESS ... When the operation will finish check the output folders: % du -hs RunDeepForcedSource/* 1.1T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_0 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_1 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_2 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_3 1.2T RunDeepForcedSource /daues_SDRP_dedupe_byfilter_4 1.3T RunDeepForcedSource /lsst_prod_dedupe_byfilter_g 1.3T RunDeepForcedSource /lsst_prod_dedupe_byfilter_i 1.3T RunDeepForcedSource /lsst_prod_dedupe_byfilter_r 1.2T RunDeepForcedSource /lsst_prod_dedupe_byfilter_u 1.3T RunDeepForcedSource /lsst_prod_dedupe_byfilter_z % ls -1 RunDeepForcedSource/*/ | grep .txt | sort -u | wc -l 1080 % ls -1 RunDeepForcedSource /daues_SDRP_dedupe_byfilter_ */ | grep .txt | sort -u | wc -l 548 % ls -1 RunDeepForcedSource /lsst_prod_dedupe_byfilter_ */ | grep .txt | sort -u | wc -l 555 These numbers could also be compared with the previously (incorrectly) merged version of the catalog (where there was teh same number of unique chunks - 1080): % ls -1 .. /catalogs_partitioned/out_daues_SDRP_dedupe_byfilter_ */ | grep .txt | sort -u | wc -l 570 % ls -1 .. /catalogs_partitioned/out_lsst_prod_dedupe_byfilter_ */ | grep .txt | sort -u | wc -l 517 CONCLUSION : this observation is perfectly consistent with expectations as we should see more equal distribution of partitions between datasets with a slightly higher number of chunks in the IN2P3 area of the new catalog (see the description section of this ticket for more details).
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Preparing the input data folders for the loading

            This is going to be an area which is formatted according to expectations of the catalog loading tools found in the GitHub package:

            The are is prepared in a way which allows the parallel loading of chunks by scripts simultaneously run on all worker nodes. Note that the very same package also hosts tools which are used by the folder preparation steps explained hereafter.

            The folder location is:

            /datasets/gapon/production/stripe82_01_catalog_load/production_load
            

            General set up

            Create the catalog and its sub-folders:

            % mkdir /datasets/gapon/production/stripe82_01_catalog_load/
            % cd /datasets/gapon/production/stripe82_01_catalog_load/
            % mkdir production_load
            

            Set up the initial map of chunks to workers stored in the previously mentioned GitHub package. This map will be used by the map generator tools:

            % ln -s /datasets/gapon/development/db_pdac_stripe82/worker2chunks  .
            

            % wc -l worker2chunks/*
              36 worker2chunks/lsst-qserv-db01
              36 worker2chunks/lsst-qserv-db02
              36 worker2chunks/lsst-qserv-db03
              36 worker2chunks/lsst-qserv-db04
              36 worker2chunks/lsst-qserv-db05
              36 worker2chunks/lsst-qserv-db06
              36 worker2chunks/lsst-qserv-db07
              36 worker2chunks/lsst-qserv-db08
              36 worker2chunks/lsst-qserv-db09
              36 worker2chunks/lsst-qserv-db10
              36 worker2chunks/lsst-qserv-db11
              36 worker2chunks/lsst-qserv-db12
              36 worker2chunks/lsst-qserv-db13
              36 worker2chunks/lsst-qserv-db14
              36 worker2chunks/lsst-qserv-db15
              36 worker2chunks/lsst-qserv-db16
              36 worker2chunks/lsst-qserv-db17
              36 worker2chunks/lsst-qserv-db18
              36 worker2chunks/lsst-qserv-db19
              36 worker2chunks/lsst-qserv-db20
              36 worker2chunks/lsst-qserv-db21
              36 worker2chunks/lsst-qserv-db22
              36 worker2chunks/lsst-qserv-db23
              36 worker2chunks/lsst-qserv-db24
              36 worker2chunks/lsst-qserv-db25
              36 worker2chunks/lsst-qserv-db26
              36 worker2chunks/lsst-qserv-db27
              36 worker2chunks/lsst-qserv-db28
              36 worker2chunks/lsst-qserv-db29
              36 worker2chunks/lsst-qserv-db30
            1080 total
            

            RunDeepSource

            Create the table-specific folder:

            % cd /datasets/gapon/production/stripe82_01_catalog_load/
            % mkdir production_load/RunDeepSource
            

            Use the generator tools to populate the folder with the previously partitioned chunks:

            % python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepSource_chunk2worker.py  mkdir \
            >& RunDeepSource_chunk2worker_mkdir.source
            % source RunDeepSource_chunk2worker_mkdir.source
            

            % ls -1 production_load/RunDeepSource/
            lsst-qserv-db01
            lsst-qserv-db02
            lsst-qserv-db03
            lsst-qserv-db04
            lsst-qserv-db05
            lsst-qserv-db06
            lsst-qserv-db07
            lsst-qserv-db08
            lsst-qserv-db09
            lsst-qserv-db10
            lsst-qserv-db11
            lsst-qserv-db12
            lsst-qserv-db13
            lsst-qserv-db14
            lsst-qserv-db15
            lsst-qserv-db16
            lsst-qserv-db17
            lsst-qserv-db18
            lsst-qserv-db19
            lsst-qserv-db20
            lsst-qserv-db21
            lsst-qserv-db22
            lsst-qserv-db23
            lsst-qserv-db24
            lsst-qserv-db25
            lsst-qserv-db26
            lsst-qserv-db27
            lsst-qserv-db28
            lsst-qserv-db29
            lsst-qserv-db30
            

            % python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepSource_chunk2worker.py ln \
            >& RunDeepSource_chunk2worker_mkdir.source
            % source RunDeepSource_chunk2worker_ln.source
            

            % ls -1 production_load/RunDeepSource/lsst-qserv-db01/
            chunk_113589_overlap.txt
            chunk_113589.txt
            chunk_113619_overlap.txt
            chunk_113619.txt
            chunk_113649_overlap.txt
            ..
            chunk_117638_overlap.txt
            chunk_117638.txt
            chunk_index.bin
            

            RunDeepForcedSource

            Create the table-specific folder:

            % cd /datasets/gapon/production/stripe82_01_catalog_load/
            % mkdir production_load/RunDeepForcedSource
            

            Use the generator tools to populate the folder with the previously partitioned chunks:

            % python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepForcedSource_chunk2worker.py mkdir \
            >& RunDeepForcedSource_chunk2worker_mkdir.source
            % source RunDeepForcedSource_chunk2worker_mkdir.source
            

            % ls -1 production_load/RunDeepForcedSource/
            lsst-qserv-db01
            lsst-qserv-db02
            lsst-qserv-db03
            lsst-qserv-db04
            lsst-qserv-db05
            lsst-qserv-db06
            lsst-qserv-db07
            lsst-qserv-db08
            lsst-qserv-db09
            lsst-qserv-db10
            lsst-qserv-db11
            lsst-qserv-db12
            lsst-qserv-db13
            lsst-qserv-db14
            lsst-qserv-db15
            lsst-qserv-db16
            lsst-qserv-db17
            lsst-qserv-db18
            lsst-qserv-db19
            lsst-qserv-db20
            lsst-qserv-db21
            lsst-qserv-db22
            lsst-qserv-db23
            lsst-qserv-db24
            lsst-qserv-db25
            lsst-qserv-db26
            lsst-qserv-db27
            lsst-qserv-db28
            lsst-qserv-db29
            lsst-qserv-db30
            

            % python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepForcedSource_chunk2worker.py  ln \
            >& RunDeepForcedSource_chunk2worker_ln.source
            % source RunDeepForcedSource_chunk2worker_ln.source
            

            % ls -1 production_load/RunDeepForcedSource/lsst-qserv-db01
            out_daues_SDRP_dedupe_byfilter_0
            out_daues_SDRP_dedupe_byfilter_1
            out_daues_SDRP_dedupe_byfilter_2
            out_daues_SDRP_dedupe_byfilter_3
            out_daues_SDRP_dedupe_byfilter_4
            out_lsst_prod_dedupe_byfilter_g
            out_lsst_prod_dedupe_byfilter_i
            out_lsst_prod_dedupe_byfilter_r
            out_lsst_prod_dedupe_byfilter_u
            out_lsst_prod_dedupe_byfilter_z
            

            % ls -1 production_load/RunDeepForcedSource/lsst-qserv-db01/out_daues_SDRP_dedupe_byfilter_0/
            chunk_114178.txt
            chunk_114208.txt
            chunk_114238.txt
            chunk_114858.txt
            chunk_114888.txt
            chunk_114918.txt
            chunk_115538.txt
            chunk_115568.txt
            chunk_115598.txt
            chunk_116218.txt
            chunk_116248.txt
            chunk_116278.txt
            chunk_116898.txt
            chunk_116928.txt
            chunk_116958.txt
            chunk_117578.txt
            chunk_117608.txt
            chunk_117638.txt
            chunk_index.bin
            

            And the total number of partitions matches the expected number:

            % ls -1 production_load/RunDeepForcedSource/*/*/ | grep .txt | ort -u | wc -l
            1080
            

            Non-partitioned tables

            Create the table-specific folder:

            % cd /datasets/gapon/production/stripe82_01_catalog_load/
            % mkdir production_load/non-part
            

            Show
            gapon Igor Gaponenko added a comment - - edited Preparing the input data folders for the loading This is going to be an area which is formatted according to expectations of the catalog loading tools found in the GitHub package: https://github.com/lsst-dm/db_pdac_stripe82 The are is prepared in a way which allows the parallel loading of chunks by scripts simultaneously run on all worker nodes. Note that the very same package also hosts tools which are used by the folder preparation steps explained hereafter. The folder location is: /datasets/gapon/production/stripe82_01_catalog_load/production_load General set up Create the catalog and its sub-folders: % mkdir /datasets/gapon/production/stripe82_01_catalog_load/ % cd /datasets/gapon/production/stripe82_01_catalog_load/ % mkdir production_load Set up the initial map of chunks to workers stored in the previously mentioned GitHub package. This map will be used by the map generator tools: % ln -s /datasets/gapon/development/db_pdac_stripe82/worker2chunks . % wc -l worker2chunks/* 36 worker2chunks /lsst-qserv-db01 36 worker2chunks /lsst-qserv-db02 36 worker2chunks /lsst-qserv-db03 36 worker2chunks /lsst-qserv-db04 36 worker2chunks /lsst-qserv-db05 36 worker2chunks /lsst-qserv-db06 36 worker2chunks /lsst-qserv-db07 36 worker2chunks /lsst-qserv-db08 36 worker2chunks /lsst-qserv-db09 36 worker2chunks /lsst-qserv-db10 36 worker2chunks /lsst-qserv-db11 36 worker2chunks /lsst-qserv-db12 36 worker2chunks /lsst-qserv-db13 36 worker2chunks /lsst-qserv-db14 36 worker2chunks /lsst-qserv-db15 36 worker2chunks /lsst-qserv-db16 36 worker2chunks /lsst-qserv-db17 36 worker2chunks /lsst-qserv-db18 36 worker2chunks /lsst-qserv-db19 36 worker2chunks /lsst-qserv-db20 36 worker2chunks /lsst-qserv-db21 36 worker2chunks /lsst-qserv-db22 36 worker2chunks /lsst-qserv-db23 36 worker2chunks /lsst-qserv-db24 36 worker2chunks /lsst-qserv-db25 36 worker2chunks /lsst-qserv-db26 36 worker2chunks /lsst-qserv-db27 36 worker2chunks /lsst-qserv-db28 36 worker2chunks /lsst-qserv-db29 36 worker2chunks /lsst-qserv-db30 1080 total RunDeepSource Create the table-specific folder: % cd /datasets/gapon/production/stripe82_01_catalog_load/ % mkdir production_load /RunDeepSource Use the generator tools to populate the folder with the previously partitioned chunks: % python /datasets/gapon/development/db_pdac_stripe82/tools/RunDeepSource_chunk2worker .py mkdir \ >& RunDeepSource_chunk2worker_mkdir. source % source RunDeepSource_chunk2worker_mkd