Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8242

PDAC: merge calexps and coadds for imgserv

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Calexp and coadd image directory trees from NCSA and IN2P3 sides of the strip82 processing need to be merged, so imgserv can serve images for the entirety of stripe82. We anticipate doing this with some scripts to build the merged directory trees as link farms.

      The datasets will be put into this folders:

      /datasets/sdss/preprocessed/dr7/sdss_stripe82_00
      /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexp
      /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd
      

      The name of the bade folder matches the name of the corresponding catalog within the PDAQ Qserv.

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            The calibrated exposures merged into folder:

            % ls -l /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps/
            lrwxrwxrwx 1 gapon grp_202    37 Nov 10 16:20 _parent -> /datasets/sdss/preprocessed/dr7/runs/
            drwxr-xr-x 2 gapon grp_202 32768 Nov 10 17:40 sci-results
             
             
            % ls -lL /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps/sci-results/
            ...
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5052
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5566
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5582
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5590
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5597
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5603
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5619
            drwxr-xr-x 8 gapon lsst_users 4096 Jun  9  2015 5622
            ..
            

            There are 267 runs in total in this folder.

            Show
            gapon Igor Gaponenko added a comment - - edited The calibrated exposures merged into folder: % ls -l /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps/ lrwxrwxrwx 1 gapon grp_202 37 Nov 10 16:20 _parent -> /datasets/sdss/preprocessed/dr7/runs/ drwxr-xr-x 2 gapon grp_202 32768 Nov 10 17:40 sci-results     % ls -lL /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps/sci-results/ ... drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5052 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5566 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5582 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5590 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5597 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5603 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5619 drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5622 .. There are 267 runs in total in this folder.
            Hide
            gapon Igor Gaponenko added a comment -

            Deep coadds have been merged as well into the following location:

            % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/
             
            drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 01:15 deepCoadd
            drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 00:36 deepCoadd-results
            lrwxrwxrwx 1 gapon lsst_users   37 Nov 10 20:03 _parent -> /datasets/sdss/preprocessed/dr7/runs/
            

            % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/deepCoadd/
             
            drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 01:15 .
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:45 ..
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:48 g
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:50 i
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:49 r
            -rwxr-xr-x 1 gapon lsst_users  439 Nov 11 00:45 skyMap.pickle
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:47 u
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:51 z
            

            % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/deepCoadd-results/
             
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:07 g
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:09 i
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:10 r
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:12 u
            drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 23:58 z
            

            Show
            gapon Igor Gaponenko added a comment - Deep coadds have been merged as well into the following location: % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/   drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 01:15 deepCoadd drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 00:36 deepCoadd-results lrwxrwxrwx 1 gapon lsst_users 37 Nov 10 20:03 _parent -> /datasets/sdss/preprocessed/dr7/runs/ % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/deepCoadd/   drwxr-xr-x 7 gapon lsst_users 4096 Nov 11 01:15 . drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:45 .. drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:48 g drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:50 i drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:49 r -rwxr-xr-x 1 gapon lsst_users 439 Nov 11 00:45 skyMap.pickle drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:47 u drwxr-xr-x 4 gapon lsst_users 4096 Nov 11 00:51 z % ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/coadd/deepCoadd-results/   drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:07 g drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:09 i drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:10 r drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 20:12 u drwxr-xr-x 4 gapon lsst_users 4096 Nov 10 23:58 z
            Hide
            gapon Igor Gaponenko added a comment -

            Both collections of images have been merged as required.

            Show
            gapon Igor Gaponenko added a comment - Both collections of images have been merged as required.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Problem found in a collection of calexps

            It has turned out the merge of calexps using run numbers was done incorrectly. As a result of this a significant fraction of images couldn't be found in the system.

            Improved algorithm for merging calexps

            The new algorithm follows the same logic which was applied when merging database catalogs. These are the steps proposed for the images:

            1. retrieve a list of images from the merged catalog sdss_stripe82_00 which was previously loaded into PDAC
            2. scan the list and create (at a destination location) folders where symbolic links for image files will be placed
            3. scan the list again and for each entry in the list use a collection of available images produced at the NCSA site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so.
            4. scan the list again and for each entry in the list for which the link was not resolved in the previous stage try to find that image in a collection of available images produced at the IN2P3 site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so.
            5. report images for which no matching images were found

            Running the algorithm in estimate mode

            This resulted in the following numbers:

            • total number of folders to be created: 8395
            • total number of images in the catalog: 2792827
            • primary images found at the NCSA collection: 1104581
            • remaining images found at the IN2P3 collection: 1392762
            • missing images: 295484

            Running the algorithm to create the links

            Total of 2497343 links were created as per:

            % find sci-results/ -name '*.fits*' | wc -l
            2497343
            

            Deploying in production

            Deployed the new collection of images along side the incorrect one:

            ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps
             
            drwxr-xr-x 283 gapon lsst_users 32768 Nov 14 11:49 sci-results
            drwxr-xr-x   2 gapon lsst_users 32768 Nov 10 19:58 sci-results.INCORRECT
            

            Loading identifiers of the missing images into PDAC Qserv

            Create a table in Qserv to allow retrieving the missing images:

            SELECT COUNT(*) FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1;
            

            +----------------+
            | SUM(QS1_COUNT) |
            +----------------+
            |         295484 |
            +----------------+
            1 row in set (0.23 sec)
            

            SELECT * FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1;
            

            +----------------------+------+----------+--------+-------+--------------------------------------------------------+
            | scienceCcdExposureId | run  | filterId | camcol | field | path                                                   |
            +----------------------+------+----------+--------+-------+--------------------------------------------------------+
            |           1040110130 | 1040 |        1 |      1 |   130 | sci-results/1040/1/g/calexp/calexp-001040-g1-0130.fits |
            +----------------------+------+----------+--------+-------+--------------------------------------------------------+
            1 row in set (0.18 sec)
            

            Show
            gapon Igor Gaponenko added a comment - - edited Problem found in a collection of calexps It has turned out the merge of calexps using run numbers was done incorrectly. As a result of this a significant fraction of images couldn't be found in the system. Improved algorithm for merging calexps The new algorithm follows the same logic which was applied when merging database catalogs. These are the steps proposed for the images: retrieve a list of images from the merged catalog sdss_stripe82_00 which was previously loaded into PDAC scan the list and create (at a destination location) folders where symbolic links for image files will be placed scan the list again and for each entry in the list use a collection of available images produced at the NCSA site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so. scan the list again and for each entry in the list for which the link was not resolved in the previous stage try to find that image in a collection of available images produced at the IN2P3 site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so. report images for which no matching images were found Running the algorithm in estimate mode This resulted in the following numbers: total number of folders to be created: 8395 total number of images in the catalog: 2792827 primary images found at the NCSA collection: 1104581 remaining images found at the IN2P3 collection: 1392762 missing images: 295484 Running the algorithm to create the links Total of 2497343 links were created as per: % find sci-results/ -name '*.fits*' | wc -l 2497343 Deploying in production Deployed the new collection of images along side the incorrect one: ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps   drwxr-xr-x 283 gapon lsst_users 32768 Nov 14 11:49 sci-results drwxr-xr-x 2 gapon lsst_users 32768 Nov 10 19:58 sci-results.INCORRECT Loading identifiers of the missing images into PDAC Qserv Create a table in Qserv to allow retrieving the missing images: SELECT COUNT (*) FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1; +----------------+ | SUM(QS1_COUNT) | +----------------+ | 295484 | +----------------+ 1 row in set (0.23 sec) SELECT * FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1; +----------------------+------+----------+--------+-------+--------------------------------------------------------+ | scienceCcdExposureId | run | filterId | camcol | field | path | +----------------------+------+----------+--------+-------+--------------------------------------------------------+ | 1040110130 | 1040 | 1 | 1 | 130 | sci-results/1040/1/g/calexp/calexp-001040-g1-0130.fits | +----------------------+------+----------+--------+-------+--------------------------------------------------------+ 1 row in set (0.18 sec)

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              fritzm Fritz Mueller
              Watchers:
              Fritz Mueller, Igor Gaponenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.