Problem found in a collection of calexps
It has turned out the merge of calexps using run numbers was done incorrectly. As a result of this a significant fraction of images couldn't be found in the system.
Improved algorithm for merging calexps
The new algorithm follows the same logic which was applied when merging database catalogs. These are the steps proposed for the images:
- retrieve a list of images from the merged catalog sdss_stripe82_00 which was previously loaded into PDAC
- scan the list and create (at a destination location) folders where symbolic links for image files will be placed
- scan the list again and for each entry in the list use a collection of available images produced at the NCSA site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so.
- scan the list again and for each entry in the list for which the link was not resolved in the previous stage try to find that image in a collection of available images produced at the IN2P3 site to see if the required image is available in that collection. Set up a symbolic link in the corresponding folder if so.
- report images for which no matching images were found
Running the algorithm in estimate mode
This resulted in the following numbers:
- total number of folders to be created: 8395
- total number of images in the catalog: 2792827
- primary images found at the NCSA collection: 1104581
- remaining images found at the IN2P3 collection: 1392762
- missing images: 295484
Running the algorithm to create the links
Total of 2497343 links were created as per:
% find sci-results/ -name '*.fits*' | wc -l
|
2497343
|
Deploying in production
Deployed the new collection of images along side the incorrect one:
ls -al /datasets/sdss/preprocessed/dr7/sdss_stripe82_00/calexps
|
|
drwxr-xr-x 283 gapon lsst_users 32768 Nov 14 11:49 sci-results
|
drwxr-xr-x 2 gapon lsst_users 32768 Nov 10 19:58 sci-results.INCORRECT
|
Loading identifiers of the missing images into PDAC Qserv
Create a table in Qserv to allow retrieving the missing images:
SELECT COUNT(*) FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1;
|
+----------------+
|
| SUM(QS1_COUNT) |
|
+----------------+
|
| 295484 |
|
+----------------+
|
1 row in set (0.23 sec)
|
SELECT * FROM sdss_stripe82_00.Science_Ccd_Exposure_NoFile LIMIT 1;
|
+----------------------+------+----------+--------+-------+--------------------------------------------------------+
|
| scienceCcdExposureId | run | filterId | camcol | field | path |
|
+----------------------+------+----------+--------+-------+--------------------------------------------------------+
|
| 1040110130 | 1040 | 1 | 1 | 130 | sci-results/1040/1/g/calexp/calexp-001040-g1-0130.fits |
|
+----------------------+------+----------+--------+-------+--------------------------------------------------------+
|
1 row in set (0.18 sec)
|
The calibrated exposures merged into folder:
drwxr-xr-x 2 gapon grp_202 32768 Nov 10 17:40 sci-results
...
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5052
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5566
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5582
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5590
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5597
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5603
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5619
drwxr-xr-x 8 gapon lsst_users 4096 Jun 9 2015 5622
..
There are 267 runs in total in this folder.