Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26544

Ingest WISE PSD and MEP catalogs into "small" Qserv cluster at NCSA

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Ingest two partitioned tables allwise_p3as_psd and allwise_p3as_mep of catalog WISE into the cluster using the new Ingest system. This effort is similar to the one of DM-26390. Note that input data of the catalog need to be repartitioned to the more efficient scheme:

      • stripes: 340
      • sub-stripes: 3
      • overlap: 0.01667

      Put scripts and configurations used during the ingest into the dedicated Git package: https://github.com/lsst-dm/qserv-ncsa-wise

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            Stats

            The input data set:

            • uncompressed size: 17 GB
            • files: 792

            The workflow equally divided 792 files into 132 files per each Qserv worker (6 workers) and 12 processing streams per worker. There were 72 such streams to be run in parallel across all workers. To optimize further operations with the input data, the workflow merged 11 files into a single TSV/CSV file to be processed by each stream. Thus the total number of input files was reduces to 72.

            The amount of data to be processed at each worker is reported below:

            2.3T lsst-qserv-db31
            2.3T lsst-qserv-db32
            2.3T lsst-qserv-db33
            2.3T lsst-qserv-db34
            3.1T lsst-qserv-db35
            4.1T lsst-qserv-db36

            There were 72 super-transactions started to ingest catalog data (one such transaction per each loading stream).

            The file ingest stage per se took less than 7 hours which corresponds to 0.7 GB/s of the aggregate ingest performance.

            Database publishing stage was quite lengthy. It took 5 hours for the operation to complete. Most of the time during the operation was spent by consolidating MySQL partitions into monolithic tables for every single chunk and its incarnation for both partitioned tables (chunk table and the corresponding overlap table). While this transformation was happening the CPU and I/O resources of the worker nodes were underutilized. Perhaps, the number of worker threads need to be increased from 16 threads per worker (matching the number of CPU _cores) to some bigger number like 32 or even higher. This may help hiding various latencies in the system.

            It took 3 hours and 32 minutes to build the secondary index. The index table was 86 GB. The performance of the operation is known to be limited by the insert time into the corresponding InnoDB table at Qserv czar database. The build time is generally a function of the total number of entries (object identifiers) in the index. The performance could be significantly improved if the index table would be split into many MySQL partitions corresponding to super-transactions. By the way, this is what happens if an ingest workflow chooses to make contributions to the index at a commit time of super-transactions.

            Show
            gapon Igor Gaponenko added a comment - - edited Stats The input data set: uncompressed size: 17 GB files: 792 The workflow equally divided 792 files into 132 files per each Qserv worker ( 6 workers) and 12 processing streams per worker. There were 72 such streams to be run in parallel across all workers. To optimize further operations with the input data, the workflow merged 11 files into a single TSV/CSV file to be processed by each stream. Thus the total number of input files was reduces to 72 . The amount of data to be processed at each worker is reported below: 2.3T lsst-qserv-db31 2.3T lsst-qserv-db32 2.3T lsst-qserv-db33 2.3T lsst-qserv-db34 3.1T lsst-qserv-db35 4.1T lsst-qserv-db36 There were 72 super-transactions started to ingest catalog data (one such transaction per each loading stream). The file ingest stage per se took less than 7 hours which corresponds to 0.7 GB/s of the aggregate ingest performance. Database publishing stage was quite lengthy. It took 5 hours for the operation to complete. Most of the time during the operation was spent by consolidating MySQL partitions into monolithic tables for every single chunk and its incarnation for both partitioned tables (chunk table and the corresponding overlap table). While this transformation was happening the CPU and I/O resources of the worker nodes were underutilized. Perhaps, the number of worker threads need to be increased from 16 threads per worker (matching the number of CPU _cores) to some bigger number like 32 or even higher. This may help hiding various latencies in the system. It took 3 hours and 32 minutes to build the secondary index . The index table was 86 GB . The performance of the operation is known to be limited by the insert time into the corresponding InnoDB table at Qserv czar database. The build time is generally a function of the total number of entries ( object identifiers ) in the index. The performance could be significantly improved if the index table would be split into many MySQL partitions corresponding to super-transactions . By the way, this is what happens if an ingest workflow chooses to make contributions to the index at a commit time of super-transactions .
            Hide
            gapon Igor Gaponenko added a comment - - edited

            What's next

            • test the new catalog
            • build table-level UNIQUE indexes for the object identifier column cntr of table allwise_p3as_psd
            • build table-level indexes for the FK object identifier column cntr_mf of table allwise_p3as_mep
            • build table-level indexes for columns ra and dec of both tables
            • get rid of the intermediate partitioned files at the Qserv workers' filesystems
            • get rid of the intermediate input files at GPFS

            Disk space utilization at worker nodes after the cleanup:

            Filesystem      Size  Used Avail Use% Mounted on
            /dev/sda1        16T  5.0T   11T  33% /qserv
            /dev/sda1        16T  5.0T   11T  33% /qserv
            /dev/sda1        16T  5.0T   11T  33% /qserv
            /dev/sda1        16T  5.0T   11T  33% /qserv
            /dev/sda1        16T  5.0T   11T  33% /qserv
            /dev/sda1        16T  5.0T   11T  33% /qserv
            

            Disk space utilization at the master03 node:

            Filesystem                   Size  Used Avail Use% Mounted on
            /dev/mapper/VGqserv-LVqserv  7.3T  218G  7.1T   3% /qserv
            

            Show
            gapon Igor Gaponenko added a comment - - edited What's next test the new catalog build table-level UNIQUE indexes for the object identifier column cntr of table allwise_p3as_psd build table-level indexes for the FK object identifier column cntr_mf of table allwise_p3as_mep build table-level indexes for columns ra and dec of both tables get rid of the intermediate partitioned files at the Qserv workers' filesystems get rid of the intermediate input files at GPFS Disk space utilization at worker nodes after the cleanup: Filesystem Size Used Avail Use% Mounted on /dev/sda1 16T 5.0T 11T 33% /qserv /dev/sda1 16T 5.0T 11T 33% /qserv /dev/sda1 16T 5.0T 11T 33% /qserv /dev/sda1 16T 5.0T 11T 33% /qserv /dev/sda1 16T 5.0T 11T 33% /qserv /dev/sda1 16T 5.0T 11T 33% /qserv Disk space utilization at the master03 node: Filesystem Size Used Avail Use% Mounted on /dev/mapper/VGqserv-LVqserv 7.3T 218G 7.1T 3% /qserv
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Quick tests

            SELECT COUNT(*) FROM wise_01.allwise_p3as_psd;
            +-----------+
            | COUNT(*)  |
            +-----------+
            | 747634026 |
            +-----------+
            

            SELECT COUNT(*) FROM wise_01.allwise_p3as_mep;
            +-------------+
            | COUNT(*)    |
            +-------------+
            | 42759337365 |
            +-------------+
            

            The number of objects matches the one from the same catalog loaded into the "large" cluster. However, the mep table of the "large" cluster has fewer rows as per:

            SELECT COUNT(*) FROM wise_00.allwise_p3as_mep;
            +-------------+
            | COUNT(*)    |
            +-------------+
            | 41997706856 |
            +-------------+
            

            INVESTIGATED: the number of rows in the corresponding input files exactly matches the number of rows in the newly ingested catalog wise_01. Apparently, the old version of the catalog was loaded with errors which went unnoticed.

            Test the secondary index optimization:

            SELECT cntr,source_id,ra,`dec` FROM wise_01.allwise_p3as_psd WHERE cntr=1601351000001;
            +---------------+----------------------+-----------+------------+
            | cntr          | source_id            | ra        | dec        |
            +---------------+----------------------+-----------+------------+
            | 1601351000001 | 0000m016_ac51-000001 | 0.5718367 | -2.1392679 |
            +---------------+----------------------+-----------+------------+
            

            Show
            gapon Igor Gaponenko added a comment - - edited Quick tests SELECT COUNT (*) FROM wise_01.allwise_p3as_psd; + -----------+ | COUNT (*) | + -----------+ | 747634026 | + -----------+ SELECT COUNT (*) FROM wise_01.allwise_p3as_mep; + -------------+ | COUNT (*) | + -------------+ | 42759337365 | + -------------+ The number of objects matches the one from the same catalog loaded into the "large" cluster. However, the mep table of the "large" cluster has fewer rows as per: SELECT COUNT(*) FROM wise_00.allwise_p3as_mep; +-------------+ | COUNT(*) | +-------------+ | 41997706856 | +-------------+ INVESTIGATED : the number of rows in the corresponding input files exactly matches the number of rows in the newly ingested catalog wise_01 . Apparently, the old version of the catalog was loaded with errors which went unnoticed. Test the secondary index optimization: SELECT cntr,source_id,ra,` dec ` FROM wise_01.allwise_p3as_psd WHERE cntr=1601351000001; + ---------------+----------------------+-----------+------------+ | cntr | source_id | ra | dec | + ---------------+----------------------+-----------+------------+ | 1601351000001 | 0000m016_ac51-000001 | 0.5718367 | -2.1392679 | + ---------------+----------------------+-----------+------------+
            Hide
            gapon Igor Gaponenko added a comment -

            Creating table indexes

            allwise_p3as_psd

            The object identifier column cntr:

            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_cntr.json \
              -o wise_01_allwise_p3as_psd_idx_cntr.result \
              >& wise_01_allwise_p3as_psd_idx_cntr.log&
            

            Columns dec and ra:

            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_dec.json \
              -o wise_01_allwise_p3as_psd_idx_dec.result \
              >& wise_01_allwise_p3as_psd_idx_dec.log&
             
            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_ra.json \
              -o wise_01_allwise_p3as_psd_idx_ra.result \
              >& wise_01_allwise_p3as_psd_idx_ra.log&
            

            allwise_p3as_mep

            The FK to the object identifier column cntr_mf:

            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_cntr_mf.json \
              -o wise_01_allwise_p3as_mep_idx_cntr_mf.result \
              >& wise_01_allwise_p3as_mep_idx_cntr_mf.log&
            

            Columns dec and ra:

            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_dec.json \
              -o wise_01_allwise_p3as_mep_idx_dec.result \
              >& wise_01_allwise_p3as_mep_idx_dec.log&
             
            curl 'http://localhost:25081/replication/sql/index' \
              -X POST -H "Content-Type: application/json" \
              -d@/datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_ra.json \
              -o wise_01_allwise_p3as_mep_idx_ra.result \
              >& wise_01_allwise_p3as_mep_idx_ra.log&
            

            Show
            gapon Igor Gaponenko added a comment - Creating table indexes allwise_p3as_psd The object identifier column cntr : curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_cntr .json \ -o wise_01_allwise_p3as_psd_idx_cntr.result \ >& wise_01_allwise_p3as_psd_idx_cntr.log& Columns dec and ra : curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_dec .json \ -o wise_01_allwise_p3as_psd_idx_dec.result \ >& wise_01_allwise_p3as_psd_idx_dec.log&   curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_psd/wise_01_allwise_p3as_psd_idx_ra .json \ -o wise_01_allwise_p3as_psd_idx_ra.result \ >& wise_01_allwise_p3as_psd_idx_ra.log& allwise_p3as_mep The FK to the object identifier column cntr_mf : curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_cntr_mf .json \ -o wise_01_allwise_p3as_mep_idx_cntr_mf.result \ >& wise_01_allwise_p3as_mep_idx_cntr_mf.log& Columns dec and ra : curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_dec .json \ -o wise_01_allwise_p3as_mep_idx_dec.result \ >& wise_01_allwise_p3as_mep_idx_dec.log&   curl 'http://localhost:25081/replication/sql/index' \ -X POST -H "Content-Type: application/json" \ -d@ /datasets/gapon/wise_01/qserv-ncsa-wise/allwise_p3as_mep/wise_01_allwise_p3as_mep_idx_ra .json \ -o wise_01_allwise_p3as_mep_idx_ra.result \ >& wise_01_allwise_p3as_mep_idx_ra.log&

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              Fritz Mueller
              Watchers:
              Fritz Mueller, Hsin-Fang Chiang, Igor Gaponenko, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: