Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7053

Assemble a complete database with S13 DRP catalogs

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:

      Description

      Create a database populated with complete catalogs resulting from processing of the SDSS Stripe 82 data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server lsst-db.ncsa.illinois.edu

      • 2016-08-10: CORRECTION: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume lsst-dm-gapon attached to image lsst-gapon-0 (floating IP address 141.142.211.152). See details on this setup in the comments stream below.)

      The database name will be: gapon_SDRP_Stripe82

      The database will be populated with the contents of the following databases:

      • NCSA (lsst-db.ncsa.illinois.edu):

        daues_SDRP_Stripe82_ncsa
        daues_SDRP_dedupe_byfilter_0
        daues_SDRP_dedupe_byfilter_1
        daues_SDRP_dedupe_byfilter_2
        daues_SDRP_dedupe_byfilter_3
        daues_SDRP_dedupe_byfilter_4
        

      • IN2P3 (ccdb02.in2p3.fr):

        lsst_prod_DC_2013_2
        lsst_prod_dedupe_byfilter_g
        lsst_prod_dedupe_byfilter_i
        lsst_prod_dedupe_byfilter_r
        lsst_prod_dedupe_byfilter_u
        lsst_prod_dedupe_byfilter_z
        

      Additional requirements:

      • The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
      • the referential integrity of the resulted database will need to be tested

        Attachments

          Issue Links

            Activity

            gapon Igor Gaponenko created issue -
            gapon Igor Gaponenko made changes -
            Field Original Value New Value
            Epic Link DM-6230 [ 24746 ]
            gapon Igor Gaponenko made changes -
            Link This issue has to be done after DM-7007 [ DM-7007 ]
            gapon Igor Gaponenko made changes -
            Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
            * lsst-db.ncsa.illinois.edu

            The database name will be:
            * gapon_SDRP_Stripe82

            The database will be populated with the contents of the following databases:
            *NCSA* (lsst-db.ncsa.illinois.edu):
            | daues_SDRP_Stripe82_ncsa |
            | daues_SDRP_dedupe_byfilter_0 |
            | daues_SDRP_dedupe_byfilter_1 |
            | daues_SDRP_dedupe_byfilter_2 |
            | daues_SDRP_dedupe_byfilter_3 |
            | daues_SDRP_dedupe_byfilter_4 |

            *IN2P3* (ccdb02.in2p3.fr):
            | lsst_prod_DC_2013_2 |
            | lsst_prod_dedupe_byfilter_g |
            | lsst_prod_dedupe_byfilter_i |
            | lsst_prod_dedupe_byfilter_r |
            | lsst_prod_dedupe_byfilter_u |
            | lsst_prod_dedupe_byfilter_z |

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            gapon Igor Gaponenko made changes -
            Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            fritzm Fritz Mueller made changes -
            Story Points 8 16
            gapon Igor Gaponenko made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            fritzm Fritz Mueller made changes -
            Sprint DB_F16_7 [ 232 ] DB_F16_7, DB_F16_8 [ 232, 233 ]
            fritzm Fritz Mueller made changes -
            Rank Ranked higher
            gapon Igor Gaponenko made changes -
            Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*
            -
            ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.)

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            gapon Igor Gaponenko made changes -
            Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server:
            * *lsst-db.ncsa.illinois.edu*
            -
            ** 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.)

            The database name will be:
            * *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
            * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.)

            The database name will be: *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            gapon Igor Gaponenko made changes -
            Description Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
            * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*. See details on this setup in the comments stream below.)

            The database name will be: *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            Create a database populated with complete catalogs resulting from processing of the *SDSS Stripe 82* data at both NCSA and IN2P3 sites. -The database has to be created at NCSA on the following database server *lsst-db.ncsa.illinois.edu*-
            * 2016-08-10: *CORRECTION*: that was the original plan. Due to the space limitation on that server the database will be installed on Nebula volume *lsst-dm-gapon* attached to image *lsst-gapon-0* (floating IP address *141.142.211.152*). See details on this setup in the comments stream below.)

            The database name will be: *gapon_SDRP_Stripe82*

            The database will be populated with the contents of the following databases:
            * *NCSA* (lsst-db.ncsa.illinois.edu):
            {code}
            daues_SDRP_Stripe82_ncsa
            daues_SDRP_dedupe_byfilter_0
            daues_SDRP_dedupe_byfilter_1
            daues_SDRP_dedupe_byfilter_2
            daues_SDRP_dedupe_byfilter_3
            daues_SDRP_dedupe_byfilter_4
            {code}
            * *IN2P3* (ccdb02.in2p3.fr):
            {code}
            lsst_prod_DC_2013_2
            lsst_prod_dedupe_byfilter_g
            lsst_prod_dedupe_byfilter_i
            lsst_prod_dedupe_byfilter_r
            lsst_prod_dedupe_byfilter_u
            lsst_prod_dedupe_byfilter_z
            {code}

            Additional requirements:
            * The duplicate entries (due to the overlap in the RA range) will need to be carefully assessed and eliminated.
            * the referential integrity of the resulted database will need to be tested
            gapon Igor Gaponenko made changes -
            Comment [ h1. Initializing database gapon_SDRP_Stripe82 (NCSA)

            The database will be initialized as a clone of an existing database *daues_SDRP_Stripe82_ncsa.sql*. The procedure has the following steps:
            # making a *mysqldump* of the source database
            # creating the destination database
            # loading the dump into the destination database

            Initially the dump files were placed at (NCSA):
            {code}
            /nfs/lsst8/gapon
            {code}
            After switching the database server from *lsst-db* to a Nebula image/volume the dumps were made (uploaded) onto the Nebula volume first. A reason of that was that neither LSST NFS mount poibnt was available to the Nebula images nor the Nebula storage volumes could be mounted on the public development machines allocated to LSST at NCSA.

            h2. Loading options

            * in theory a more efficient way of cloning a database would be by chaining two operations using UNIX pipe w/o compressing/decompressing the stream:
            {code:bash}
            mysqldump ... daues_SDRP_Stripe82_ncsa | mysql ... gapon_SDRP_Stripe82
            {code}
            The main reason why I chose to go via an intermediate file is because it allows a quicker turnaround time for making other clones (should this be needed) based on the dump file. The second reason was related to the security restrictions imposed by MySQL for views. For instance, a straightforward import of the above taken snapshot failed with the following message:
            {code}
            ERROR 1227 (42000) at line 140273: Access denied; you need (at least one of) the SUPER privilege(s) for this operation
            {code}
            Further investigation has pointed out onto the following line in the dump:
            {code}
            /*!50013 DEFINER=`daues`@`%` SQL SECURITY DEFINER */
            {code}
            Hence, a solution was to:
            {code:bash}
            % cat daues_SDRP_Stripe82_ncsa.sql | sed 's/daues/gapon/g' | mysql gapon_SDRP_Stripe82
            {code}
            At this point the database was properly seeded.

            h3. Observations

            The performance of the database loading was rather low: 20 MB/s or less. And there was a substantial CPU usage on the database server:
            {code}
            % top

              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
             5236 mysql 20 0 972804 216772 4044 S 83.0 1.3 689:45.22 mysqld
             8307 gapon 20 0 238796 6632 2768 S 16.3 0.0 95:42.25 mysql
            {code}
            ]
            gapon Igor Gaponenko made changes -
            Comment [ h1. Loading deep forced sources into gapon_SDRP_Stripe82 (NCSA)

            Loading data of the *u* band (filter index *0*):
            {code:sql}
            USE gapon_SDRP_Stripe82;
            CREATE TABLE RunDeepForcedSource LIKE daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource;
            INSERT INTO RunDeepForcedSource SELECT * FROM daues_SDRP_dedupe_byfilter_0.RunDeepForcedSource;
            {code}
            The query finished in 32 hours:
            {code}
            Query OK, 1729599791 rows affected (1 day 8 hours 15 min 12.97 sec)
            Records: 1729599791 Duplicates: 0 Warnings: 0
            {code}
            *Observations*: the aggregate performance was (700 GB of data from the table + 190 GB of index) / (1 day 8 hours 15 min) = *7.7 MB/s*.

            *UPDATED: 2016-08-08*: ran out of space on the database server. Giving up with this option. Investigating other options for setting up a database server with the large disk.
            ]
            fritzm Fritz Mueller made changes -
            Sprint DB_F16_7, DB_F16_8 [ 232, 233 ] DB_F16_7, DB_F16_8, DB_F16_9 [ 232, 233, 249 ]
            fritzm Fritz Mueller made changes -
            Rank Ranked higher
            gapon Igor Gaponenko made changes -
            Labels pdac
            gapon Igor Gaponenko made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            ymei Yi Mei [X] (Inactive) made changes -
            Link This issue relates to DM-7762 [ DM-7762 ]
            shupe David Shupe made changes -
            Remote Link This issue links to "Page (Confluence)" [ 14327 ]
            gapon Igor Gaponenko made changes -
            Link This issue is triggering DM-8241 [ DM-8241 ]

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Watchers:
              Fritz Mueller, Gregory Dubois-Felsmann, Igor Gaponenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: