Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9373

Load WISE image metadata to PDAC

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      10
    • Sprint:
      DB_S17_2, DB_S17_4, DB_S17_5
    • Team:
      Data Access and Database

      Description

      Goals

      This task complements an effort for loading object and forced source catalogs into PDAC (as per JIRA DM-9372). The goal is to load the complementary image metadata tables. Note that these are non-partitioned tables which would need to be fully replicated across all worker nodes of the Qserv cluster in PDAC.

      References

      Properties of the dataset are documented at:

      A list of WISE catalogs to be loaded:

      Tools and scripts developed for this efforts are found at the GitHub package:

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment -

            WISE All-Sky Single Exposure (L1b) Image Inventory Table

            This catalog corresponds to the IRSA table: allsky_4band_p1bm_frm
            All temporary files were placed at:

            /datasets/gapon/wise/allsky_4band_p1bm_frm
            

            Table data and the actual column names were found at the IRSA's Gator service:

            Downloading and pre-processing table schema and data

            The table schema was downloaded using this We service:

            % cd /datasets/gapon/wise/allsky_4band_p1bm_frm
            % curl -o allsky_4band_p1bm_frm.xml  'http://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-dd?mode=xml&catalog=allsky_4band_p1bm_frm']
            

            Then it was translated into the MySQL schema definition by:

            % python /datasets/gapon/development/db_pdac_wise/tools/xml2schema.py \
            allsky_4band_p1bm_frm.xml > allsky_4band_p1bm_frm.schema
            

            IMPORTANT: the schema file had 90 columns. Meanwhile, according to the IRSA Web service (and the actual data files to be uploaded on the next stage), there are only 72 columns in that table. This had to be fixed manually by editing the MySQL schema file. Two schema files were created in the temporary folder:

            allsky_4band_p1bm_frm.schema
            allsky_4band_p1bm_frm.schema.90
            

            After that the actual data were downloaded. UNEXPECTEDLY, this had to be done in 4 steps (one file for each band) because of an internal limitation for the maximum allowed table size of 3472222 rows imposed by the Web service while the total number of rows in the table was 5964417. Each file was downloaded by specifying the band constraint in the optional box of the Web service. For example:

            band = 1
            

            The files were placed at subfolder of the temporary folder:

            % ls -al downloaded/
             
            -rw-r--r-- 1 gapon grp_202 1951135610 May  5 17:11 allsky_4band_p1bm_frm.1.tbl
            -rw-r--r-- 1 gapon grp_202 1951125288 May  5 17:50 allsky_4band_p1bm_frm.1.tbl.data
            -rw-r--r-- 1 gapon grp_202 1951135610 May  5 17:13 allsky_4band_p1bm_frm.2.tbl
            -rw-r--r-- 1 gapon grp_202 1951125288 May  5 17:53 allsky_4band_p1bm_frm.2.tbl.data
            -rw-r--r-- 1 gapon grp_202 1951017890 May  5 17:14 allsky_4band_p1bm_frm.3.tbl
            -rw-r--r-- 1 gapon grp_202 1951007568 May  5 17:54 allsky_4band_p1bm_frm.3.tbl.data
            -rw-r--r-- 1 gapon grp_202 1948209614 May  5 17:15 allsky_4band_p1bm_frm.4.tbl
            -rw-r--r-- 1 gapon grp_202 1948199292 May  5 17:59 allsky_4band_p1bm_frm.4.tbl.data
             
            % wc -l downloaded/allsky_4band_p1bm_frm.[1234].tbl.data
               1491686 downloaded/allsky_4band_p1bm_frm.1.tbl.data
               1491686 downloaded/allsky_4band_p1bm_frm.2.tbl.data
               1491596 downloaded/allsky_4band_p1bm_frm.3.tbl.data
               1489449 downloaded/allsky_4band_p1bm_frm.4.tbl.data
               5964417 total
            

            Those 4 data files were concatenate together into a single file ready for further processing:

            % ls -al 
            -rw-r--r--  1 gapon grp_202 7801457436 May  5 18:01 allsky_4band_p1bm_frm.tbl.data
            

            The next step was to translate the *.tbl format file into the TSV files required by the LSST data loader tools. This required to pay attention to the following DATE & TIME columns which had to be interpreted as a single string (0-based index used below):

              32:
              36:
              72:
            

            The translator and its Makefile is found here:

            allsky_4band_p1bm_frm.cc
            Makefile
            

            The translation command was:

            % make
            % ./allsky_4band_p1bm_frm \
            allsky_4band_p1bm_frm.tbl.data allsky_4band_p1bm_frm.tsv
            

            The resulting file is located next to the original one:

              -rw-r--r--  1 gapon grp_202 7801457436 May  5 18:01 allsky_4band_p1bm_frm.tbl.data
              -rw-r--r--  1 gapon grp_202 5630886945 May  5 20:23 allsky_4band_p1bm_frm.tsv
            

            The number of rows (with wc -l) matches expectations:

            5964417 allsky_4band_p1bm_frm.tsv
            5964417 allsky_4band_p1bm_frm.tbl.data
            

            Preparing the data deployment area:

            Created two symbolic links at:

            % cd /datasets/gapon/production/wise_catalog_load/production_load/non-part
            % ln -s \
            ../../../../wise/allsky_4band_p1bm_frm/allsky_4band_p1bm_frm.schema \
            allsky_4band_p1bm_frm.sql
             
            % ln -s \
            ../../../../wise/allsky_4band_p1bm_frm/allsky_4band_p1bm_frm.tsv \
            allsky_4band_p1bm_frm.txt 
            

            Extending the GitHub package to support the new non-partitioned table:

            Added the schema file:

            db_pdac_wise/sql/allsky_4band_p1bm_frm.sql
            

            Added the CSS file with the following name and content:

            % cat config/css_allsky_4band_p1bm_frm.params.tmpl
            [table_info]
            tableName:       allsky_4band_p1bm_frm
            partitioning:    0
            schemaFile:      $SQL_DIR/allsky_4band_p1bm_frm.sql
            compression:     0
            

            Extended the dataset specification file with:

            % cat scripts/dataset.bash
            ...
            OUTPUT_FORCED_SOURCE_TABLE="allwise_i3as_mep allsky_4band_p1bm_frm"
            ...
            

            Loading the new table into PDAC

            Repeated the usual steps explained for other non-partitioned tables earlier:

            • run a script to load CSS
            • run a script to load the data
            • checked log files for possible problems (none found so far)

            Testing results in PDAC

            All looks good so far:

            SELECT COUNT(*) FROM wise_00.allsky_4band_p1bm_frm;
            +----------------+
            | SUM(QS1_COUNT) |
            +----------------+
            |        5964417 |
            +----------------+
            1 row in set (0.24 sec)
             
            SELECT scan_id,band,l0file FROM wise_00.allsky_4band_p1bm_frm LIMIT 1;
            +---------+------+-----------------------------------------------------------+
            | scan_id | band | l0file                                                    |
            +---------+------+-----------------------------------------------------------+
            | 06293a  |    1 | /wise/fops/l0/3a/06293a/fr/054/06293a054-w1-int-0.fits.gz |
            +---------+------+-----------------------------------------------------------+
            1 row in set (0.21 sec)
            

            Show
            gapon Igor Gaponenko added a comment - WISE All-Sky Single Exposure (L1b) Image Inventory Table This catalog corresponds to the IRSA table: allsky_4band_p1bm_frm All temporary files were placed at: /datasets/gapon/wise/allsky_4band_p1bm_frm Table data and the actual column names were found at the IRSA's Gator service: http://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-scan?mission=irsa&submit=Select&projshort=WISE This service plays an important role in figuring out which columns to expect in the tables. Downloading and pre-processing table schema and data The table schema was downloaded using this We service: % cd /datasets/gapon/wise/allsky_4band_p1bm_frm % curl -o allsky_4band_p1bm_frm.xml 'http://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-dd?mode=xml&catalog=allsky_4band_p1bm_frm' ] Then it was translated into the MySQL schema definition by: % python /datasets/gapon/development/db_pdac_wise/tools/xml2schema .py \ allsky_4band_p1bm_frm.xml > allsky_4band_p1bm_frm.schema IMPORTANT : the schema file had 90 columns. Meanwhile, according to the IRSA Web service (and the actual data files to be uploaded on the next stage), there are only 72 columns in that table. This had to be fixed manually by editing the MySQL schema file. Two schema files were created in the temporary folder: allsky_4band_p1bm_frm.schema allsky_4band_p1bm_frm.schema.90 After that the actual data were downloaded. UNEXPECTEDLY , this had to be done in 4 steps (one file for each band ) because of an internal limitation for the maximum allowed table size of 3472222 rows imposed by the Web service while the total number of rows in the table was 5964417 . Each file was downloaded by specifying the band constraint in the optional box of the Web service. For example: band = 1 The files were placed at subfolder of the temporary folder: % ls -al downloaded/   -rw-r--r-- 1 gapon grp_202 1951135610 May 5 17:11 allsky_4band_p1bm_frm.1.tbl -rw-r--r-- 1 gapon grp_202 1951125288 May 5 17:50 allsky_4band_p1bm_frm.1.tbl.data -rw-r--r-- 1 gapon grp_202 1951135610 May 5 17:13 allsky_4band_p1bm_frm.2.tbl -rw-r--r-- 1 gapon grp_202 1951125288 May 5 17:53 allsky_4band_p1bm_frm.2.tbl.data -rw-r--r-- 1 gapon grp_202 1951017890 May 5 17:14 allsky_4band_p1bm_frm.3.tbl -rw-r--r-- 1 gapon grp_202 1951007568 May 5 17:54 allsky_4band_p1bm_frm.3.tbl.data -rw-r--r-- 1 gapon grp_202 1948209614 May 5 17:15 allsky_4band_p1bm_frm.4.tbl -rw-r--r-- 1 gapon grp_202 1948199292 May 5 17:59 allsky_4band_p1bm_frm.4.tbl.data   % wc -l downloaded /allsky_4band_p1bm_frm .[1234].tbl.data 1491686 downloaded /allsky_4band_p1bm_frm .1.tbl.data 1491686 downloaded /allsky_4band_p1bm_frm .2.tbl.data 1491596 downloaded /allsky_4band_p1bm_frm .3.tbl.data 1489449 downloaded /allsky_4band_p1bm_frm .4.tbl.data 5964417 total Those 4 data files were concatenate together into a single file ready for further processing: % ls -al -rw-r--r-- 1 gapon grp_202 7801457436 May 5 18:01 allsky_4band_p1bm_frm.tbl.data The next step was to translate the *.tbl format file into the TSV files required by the LSST data loader tools. This required to pay attention to the following DATE & TIME columns which had to be interpreted as a single string (0-based index used below): 32: 36: 72: The translator and its Makefile is found here: allsky_4band_p1bm_frm.cc Makefile The translation command was: % make % . /allsky_4band_p1bm_frm \ allsky_4band_p1bm_frm.tbl.data allsky_4band_p1bm_frm.tsv The resulting file is located next to the original one: -rw-r--r-- 1 gapon grp_202 7801457436 May 5 18:01 allsky_4band_p1bm_frm.tbl.data -rw-r--r-- 1 gapon grp_202 5630886945 May 5 20:23 allsky_4band_p1bm_frm.tsv The number of rows (with wc -l ) matches expectations: 5964417 allsky_4band_p1bm_frm.tsv 5964417 allsky_4band_p1bm_frm.tbl.data Preparing the data deployment area: Created two symbolic links at: % cd /datasets/gapon/production/wise_catalog_load/production_load/non-part % ln -s \ ../../../.. /wise/allsky_4band_p1bm_frm/allsky_4band_p1bm_frm .schema \ allsky_4band_p1bm_frm.sql % ln -s \ ../../../.. /wise/allsky_4band_p1bm_frm/allsky_4band_p1bm_frm .tsv \ allsky_4band_p1bm_frm.txt Extending the GitHub package to support the new non-partitioned table: Added the schema file: db_pdac_wise/sql/allsky_4band_p1bm_frm.sql Added the CSS file with the following name and content: % cat config /css_allsky_4band_p1bm_frm .params.tmpl [table_info] tableName: allsky_4band_p1bm_frm partitioning: 0 schemaFile: $SQL_DIR /allsky_4band_p1bm_frm .sql compression: 0 Extended the dataset specification file with: % cat scripts /dataset . bash ... OUTPUT_FORCED_SOURCE_TABLE= "allwise_i3as_mep allsky_4band_p1bm_frm" ... Loading the new table into PDAC Repeated the usual steps explained for other non-partitioned tables earlier: run a script to load CSS run a script to load the data checked log files for possible problems (none found so far) Testing results in PDAC All looks good so far: SELECT COUNT (*) FROM wise_00.allsky_4band_p1bm_frm; + ----------------+ | SUM (QS1_COUNT) | + ----------------+ | 5964417 | + ----------------+ 1 row in set (0.24 sec)   SELECT scan_id,band,l0file FROM wise_00.allsky_4band_p1bm_frm LIMIT 1; + ---------+------+-----------------------------------------------------------+ | scan_id | band | l0file | + ---------+------+-----------------------------------------------------------+ | 06293a | 1 | /wise/fops/l0/3a/06293a/fr/054/06293a054-w1- int -0.fits.gz | + ---------+------+-----------------------------------------------------------+ 1 row in set (0.21 sec)
            Hide
            gapon Igor Gaponenko added a comment -

            AllWISE Atlas Image Inventory Table, the meta data for AllWISE coadd images

            Table name: allwise_p3am_cdd
            Number of columns: 62
            Number of rows: 72960

            Downloading input data, preprocessing

            Follow the same procedure as for the previous table. Pay attention to the actual column number of 62 in the downloaded data versus 72 columns in the formal XML schema.

            Testing results

            SELECT COUNT(*) FROM wise_00.allwise_p3am_cdd;
            +----------------+
            | SUM(QS1_COUNT) |
            +----------------+
            |          72960 |
            +----------------+
            1 row in set (0.23 sec)
             
            SELECT coadd_id,band,fcdate FROM allwise_p3am_cdd ORDER BY coadd_id,band LIMIT 8;
            +---------------+------+---------------------+
            | coadd_id      | band | fcdate              |
            +---------------+------+---------------------+
            | 0000m016_ac51 |    1 | 2013-07-25 18:46:11 |
            | 0000m016_ac51 |    2 | 2013-07-25 19:44:47 |
            | 0000m016_ac51 |    3 | 2013-07-25 20:15:51 |
            | 0000m016_ac51 |    4 | 2013-07-25 20:33:35 |
            | 0000m031_ac51 |    1 | 2013-07-25 07:34:53 |
            | 0000m031_ac51 |    2 | 2013-07-25 08:36:19 |
            | 0000m031_ac51 |    3 | 2013-07-25 09:06:59 |
            | 0000m031_ac51 |    4 | 2013-07-25 09:25:06 |
            +---------------+------+---------------------+
            8 rows in set (0.29 sec)
            

            Show
            gapon Igor Gaponenko added a comment - AllWISE Atlas Image Inventory Table, the meta data for AllWISE coadd images Table name: allwise_p3am_cdd Number of columns: 62 Number of rows: 72960 Downloading input data, preprocessing Follow the same procedure as for the previous table. Pay attention to the actual column number of 62 in the downloaded data versus 72 columns in the formal XML schema. Testing results SELECT COUNT (*) FROM wise_00.allwise_p3am_cdd; + ----------------+ | SUM (QS1_COUNT) | + ----------------+ | 72960 | + ----------------+ 1 row in set (0.23 sec)   SELECT coadd_id,band,fcdate FROM allwise_p3am_cdd ORDER BY coadd_id,band LIMIT 8; + ---------------+------+---------------------+ | coadd_id | band | fcdate | + ---------------+------+---------------------+ | 0000m016_ac51 | 1 | 2013-07-25 18:46:11 | | 0000m016_ac51 | 2 | 2013-07-25 19:44:47 | | 0000m016_ac51 | 3 | 2013-07-25 20:15:51 | | 0000m016_ac51 | 4 | 2013-07-25 20:33:35 | | 0000m031_ac51 | 1 | 2013-07-25 07:34:53 | | 0000m031_ac51 | 2 | 2013-07-25 08:36:19 | | 0000m031_ac51 | 3 | 2013-07-25 09:06:59 | | 0000m031_ac51 | 4 | 2013-07-25 09:25:06 | + ---------------+------+---------------------+ 8 rows in set (0.29 sec)
            Hide
            gapon Igor Gaponenko added a comment -

            WISE 3-Band Cryo Single Exposure (L1b) Image Inventory Table

            Table name: * allsky_3band_p1bm_frm*
            Number of columns: 71
            Number of rows: 1165452

            Downloading input data, preprocessing

            Follow the same procedure as for the previous table. Pay attention to the actual column number of 71 in the downloaded data versus 90 columns in the formal XML schema.

            Testing results

            SELECT COUNT(*) FROM wise_00.allsky_3band_p1bm_frm;
            +----------------+
            | SUM(QS1_COUNT) |
            +----------------+
            |        1165452 |
            +----------------+
            1 row in set (0.29 sec)
             
            SELECT scan_id,scangrp,l0file FROM wise_00.allsky_3band_p1bm_frm  LIMIT 1;
            +---------+---------+-----------------------------------------------------------+
            | scan_id | scangrp | l0file                                                    |
            +---------+---------+-----------------------------------------------------------+
            | 07728a  | 8a      | /wise/fops/l0/8a/07728a/fr/255/07728a255-w1-int-0.fits.gz |
            +---------+---------+-----------------------------------------------------------+
            1 row in set (0.26 sec)
            
            

            Show
            gapon Igor Gaponenko added a comment - WISE 3-Band Cryo Single Exposure (L1b) Image Inventory Table Table name: * allsky_3band_p1bm_frm* Number of columns: 71 Number of rows: 1165452 Downloading input data, preprocessing Follow the same procedure as for the previous table. Pay attention to the actual column number of 71 in the downloaded data versus 90 columns in the formal XML schema. Testing results SELECT COUNT (*) FROM wise_00.allsky_3band_p1bm_frm; + ----------------+ | SUM (QS1_COUNT) | + ----------------+ | 1165452 | + ----------------+ 1 row in set (0.29 sec)   SELECT scan_id,scangrp,l0file FROM wise_00.allsky_3band_p1bm_frm LIMIT 1; + ---------+---------+-----------------------------------------------------------+ | scan_id | scangrp | l0file | + ---------+---------+-----------------------------------------------------------+ | 07728a | 8a | /wise/fops/l0/8a/07728a/fr/255/07728a255-w1- int -0.fits.gz | + ---------+---------+-----------------------------------------------------------+ 1 row in set (0.26 sec)
            Hide
            gapon Igor Gaponenko added a comment -

            WISE Post-Cryo Single Exposure (L1b) Image Inventory Table

            Table name: * allsky_2band_p1bm_frm*
            Number of columns: 73
            Number of rows: 1802540

            Downloading input data, preprocessing

            Follow the same procedure as for the previous table. Pay attention to the actual column number of 73 in the downloaded data versus 91 columns in the formal XML schema.

            Testing results

            SELECT COUNT(*) FROM wise_00.allsky_2band_p1bm_frm;
            +----------------+
            | SUM(QS1_COUNT) |
            +----------------+
            |        1802540 |
            +----------------+
            1 row in set (0.24 sec)
             
            SELECT scan_id,scangrp,l0file FROM wise_00.allsky_2band_p1bm_frm  LIMIT 1;
            +---------+---------+-----------------------------------------------------------+
            | scan_id | scangrp | l0file                                                    |
            +---------+---------+-----------------------------------------------------------+
            | 11044a  | 4a      | /wise/fops/l0/4a/11044a/fr/065/11044a065-w2-int-0.fits.gz |
            +---------+---------+-----------------------------------------------------------+
            1 row in set (0.28 sec)
            

            Show
            gapon Igor Gaponenko added a comment - WISE Post-Cryo Single Exposure (L1b) Image Inventory Table Table name: * allsky_2band_p1bm_frm* Number of columns: 73 Number of rows: 1802540 Downloading input data, preprocessing Follow the same procedure as for the previous table. Pay attention to the actual column number of 73 in the downloaded data versus 91 columns in the formal XML schema. Testing results SELECT COUNT (*) FROM wise_00.allsky_2band_p1bm_frm; + ----------------+ | SUM (QS1_COUNT) | + ----------------+ | 1802540 | + ----------------+ 1 row in set (0.24 sec)   SELECT scan_id,scangrp,l0file FROM wise_00.allsky_2band_p1bm_frm LIMIT 1; + ---------+---------+-----------------------------------------------------------+ | scan_id | scangrp | l0file | + ---------+---------+-----------------------------------------------------------+ | 11044a | 4a | /wise/fops/l0/4a/11044a/fr/065/11044a065-w2- int -0.fits.gz | + ---------+---------+-----------------------------------------------------------+ 1 row in set (0.28 sec)
            Hide
            gapon Igor Gaponenko added a comment -

            All metadata catalogs are now available in PDAC.

            Show
            gapon Igor Gaponenko added a comment - All metadata catalogs are now available in PDAC.

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              fritzm Fritz Mueller
              Watchers:
              Fritz Mueller, Gregory Dubois-Felsmann, Igor Gaponenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.