Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-359

Simplify Co-add example in Software User Guide

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      8
    • Sprint:
      DevOps Sprint 1
    • Team:
      SQuaRE

      Description

      The current example in the LSST Software User Guide for co-addition reflects the processes necessary to perform a DR production. While thorough, it only really works on the lsst cluster. The example should be simplified to work on a smaller subset of data, and on single-user machines.

      Definition of Done:

      • Following the documentation, it will be possible for users to identify the SDSS data they need (the subset of files)
      • There will be instructions on how to download the necessary files to their local machine
      • There will be instructions on how to build the necessary repositories
      • There will be instructions on how to run the Co-Add+forced photometry tasks.
      • Any issues requiring access rights to LSST machines or databases will be identified and issues created for later.

        Attachments

          Activity

          Hide
          shaw Richard Shaw [X] (Inactive) added a comment -

          Response to Review

          Since the issues are not numbered, it is challenging to respond point-by-point. But in general all issues were addressed. Some may deserve a comment:

          • skyMap geometry. Yes, the region should have referred to the whole of Stripe 82, as the definition of the geometry is succinct and one could apply it to any subset, etc. This was an error that was inadvertently propagated from a much earlier version of the page. Fixed.
          • Added a few comments and Info blocks on the amount of disk space consumed by the processing.
          • added version ID of package setup's
          • The scripts for munging the format of the patches, etc. are consistent from start to finish. They are different that the original Wiki instructions, in that the IDs are used with @file, so a "--id" argument is often needed.
          • Embellished the descriptions of the SDSS processing.
          • ingestCoadd.py. "--create-views" is copied into the buffer for me. Confluence is fussy about clicks and drags, and it is possible for the user to muck it up without trying very hards.
          • After some off-line discussion, a set of SQL commands was included to enable keys, along with a command to execute for MySQL.
          • Fix for the updated slots was included. I worry that I may not be informed if this changes again.
          • I would not like to label sourceAssoc and ingestSourceAssoc steps as "optional" since significant science or QA is enabled with these steps. Yes, they do consume considerable CPU, so I've amp'ed up the warning to the user.
          • I've attempted to make the Summary commands consistent with those in the text, and to remove machine-specific arguments where possible. A quick re-review of this would make me more comfortable that I got them all.
          Show
          shaw Richard Shaw [X] (Inactive) added a comment - Response to Review Since the issues are not numbered, it is challenging to respond point-by-point. But in general all issues were addressed. Some may deserve a comment: skyMap geometry. Yes, the region should have referred to the whole of Stripe 82, as the definition of the geometry is succinct and one could apply it to any subset, etc. This was an error that was inadvertently propagated from a much earlier version of the page. Fixed. Added a few comments and Info blocks on the amount of disk space consumed by the processing. added version ID of package setup's The scripts for munging the format of the patches, etc. are consistent from start to finish. They are different that the original Wiki instructions, in that the IDs are used with @file, so a "--id" argument is often needed. Embellished the descriptions of the SDSS processing. ingestCoadd.py. "--create-views" is copied into the buffer for me. Confluence is fussy about clicks and drags, and it is possible for the user to muck it up without trying very hards. After some off-line discussion, a set of SQL commands was included to enable keys, along with a command to execute for MySQL. Fix for the updated slots was included. I worry that I may not be informed if this changes again. I would not like to label sourceAssoc and ingestSourceAssoc steps as "optional" since significant science or QA is enabled with these steps. Yes, they do consume considerable CPU, so I've amp'ed up the warning to the user. I've attempted to make the Summary commands consistent with those in the text, and to remove machine-specific arguments where possible. A quick re-review of this would make me more comfortable that I got them all.
          Hide
          yusra Yusra AlSayyad added a comment - - edited

          Fix for the updated slots was included. I worry that I may not be informed if this changes again.

          The nightly build runs processSdssCcd.py and checks the output, but I think we'd gain a lot by extending the script to do basically everything that this demo does just for a small number of runs...Maybe weekly if not realistic to do nightly. That way we'd catch any changes as they happen.

          I read through the changes on the tutorial. Looking good. Couple comments:

          The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory, and will consume ~20 GB of disk space to store intermediate files. Do not delete or compress these files until after the Co-add images have been created.

          The forced photometry step uses the calexps too. At this point the directory is already 34GB so there's no getting around the need for 35GB for this demo. (Until we have a butler that can compress and uncompress images at will).

          Finally, use MySQL to enable the database keys, which will make your table much more useful for scientific inquiries. Here, <hostname> is the address of the DB server you have been using, http://lsst-db.ncsa.illinois.edu by default.

          I think confluence thinks the host name is a website and is adding a "http://". Perhaps some formatting would prevent that.

          Content of enable_keys.sql:

          • the syntax example in the first line of the file: mysql won't like that. Either move it out of the .sql file and into the text like, "To index the science tables, use the syntax ALTER TABLE <table name> ENABLE KEYS;" or you can turn it into a comment in the .sql file by adding two hyphens " – " before ALTER TABLE. (The space between "-- " and ALTER is important.
          • Change DeepSource to RunDeepSource
          • Change DeepForcedSource to RunDeepForcedSource

          I didn't catch this the first time around, but users might not know that Illinois now prefers the domain illinois.edu over uiuc.edu, and that lsst10.ncsa.uiuc.edu and {lsst-db.ncsa.illinois.edu}} both redirect to the same server. It might be more clear if we keep it a consistent lsst-db.ncsa.illinois.edu through out the tutorial. Now unfortunately in order for this tutorial to work as written, their db-auth.paf file needs to have both like this:

          database: {
              authInfo: {
                  host: lsst-db.ncsa.illinois.edu
                  port: 3306
                  user: <your mysql user name>
                  password: <your mysql password>
              }
           
              authInfo: {
                  host: lsst10.ncsa.uiuc.edu
                  port: 3306
                  user: <your mysql user name>
                  password: <your mysql password>
              }
          }

          ... because some of the stack still has the old host name as the default. The alternative to changing the db-auth.paf file is to change the host name on the forced photometry step, by adding the argument {{ references.host='lsst-db.ncsa.illinois.edu' }} to the command-line.

           
          forcedPhot.py $DEMO_DIR/calexp_dir --output $DEMO_DIR/forcedPhot_dir --configfile $DEMO_DIR/forcedPhotConfig.py --config references.dbName=$DB_NAME references.host='lsst-db.ncsa.illinois.edu'  references.filterName=r @$DEMO_DIR/forcedPhotInputs_r.txt -j $NCORES >& forcedPhot_log.txt

          Up to you which option is better. I like having lsst10.ncsa.uiuc.edu in my db-auth.paf because that host name default is still sprinkled throughout the stack.

          Summary of commands at the end:

          Show
          yusra Yusra AlSayyad added a comment - - edited Fix for the updated slots was included. I worry that I may not be informed if this changes again. The nightly build runs processSdssCcd.py and checks the output, but I think we'd gain a lot by extending the script to do basically everything that this demo does just for a small number of runs...Maybe weekly if not realistic to do nightly. That way we'd catch any changes as they happen. I read through the changes on the tutorial. Looking good. Couple comments: The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory, and will consume ~20 GB of disk space to store intermediate files. Do not delete or compress these files until after the Co-add images have been created. The forced photometry step uses the calexps too. At this point the directory is already 34GB so there's no getting around the need for 35GB for this demo. (Until we have a butler that can compress and uncompress images at will). Finally, use MySQL to enable the database keys, which will make your table much more useful for scientific inquiries. Here, <hostname> is the address of the DB server you have been using, http://lsst-db.ncsa.illinois.edu by default. I think confluence thinks the host name is a website and is adding a "http://". Perhaps some formatting would prevent that. Content of enable_keys.sql: the syntax example in the first line of the file: mysql won't like that. Either move it out of the .sql file and into the text like, "To index the science tables, use the syntax ALTER TABLE <table name> ENABLE KEYS;" or you can turn it into a comment in the .sql file by adding two hyphens " – " before ALTER TABLE. (The space between "-- " and ALTER is important. Change DeepSource to RunDeepSource Change DeepForcedSource to RunDeepForcedSource I didn't catch this the first time around, but users might not know that Illinois now prefers the domain illinois.edu over uiuc.edu, and that lsst10.ncsa.uiuc.edu and {lsst-db.ncsa.illinois.edu}} both redirect to the same server. It might be more clear if we keep it a consistent lsst-db.ncsa.illinois.edu through out the tutorial. Now unfortunately in order for this tutorial to work as written , their db-auth.paf file needs to have both like this: database: { authInfo: { host: lsst-db.ncsa.illinois.edu port: 3306 user: <your mysql user name> password: <your mysql password> }   authInfo: { host: lsst10.ncsa.uiuc.edu port: 3306 user: <your mysql user name> password: <your mysql password> } } ... because some of the stack still has the old host name as the default. The alternative to changing the db-auth.paf file is to change the host name on the forced photometry step, by adding the argument {{ references.host='lsst-db.ncsa.illinois.edu' }} to the command-line. forcedPhot.py $DEMO_DIR/calexp_dir --output $DEMO_DIR/forcedPhot_dir --configfile $DEMO_DIR/forcedPhotConfig.py --config references.dbName=$DB_NAME references.host='lsst-db.ncsa.illinois.edu' references.filterName=r @$DEMO_DIR/forcedPhotInputs_r.txt -j $NCORES >& forcedPhot_log.txt Up to you which option is better. I like having lsst10.ncsa.uiuc.edu in my db-auth.paf because that host name default is still sprinkled throughout the stack. Summary of commands at the end: looks like you do "rawInputs_r.txt ./Stripe82" twice. http://lsst-db.ncsa.illinois.edu -> lsst-db.ncsa.illinois.edu
          Hide
          shaw Richard Shaw [X] (Inactive) added a comment -

          Fixes to the (post-)review comments are implemented. Status is now "Still Done".

          Show
          shaw Richard Shaw [X] (Inactive) added a comment - Fixes to the (post-)review comments are implemented. Status is now "Still Done".
          Hide
          yusra Yusra AlSayyad added a comment -

          Double check the db-auth.paf. The second one should be 'lsst10.ncsa.uiuc.edu'

          Then status is "Very Done"

          Show
          yusra Yusra AlSayyad added a comment - Double check the db-auth.paf. The second one should be 'lsst10.ncsa.uiuc.edu' Then status is "Very Done"
          Hide
          shaw Richard Shaw [X] (Inactive) added a comment -

          Fixed. Maybe status is "Done for Now"

          Show
          shaw Richard Shaw [X] (Inactive) added a comment - Fixed. Maybe status is "Done for Now"

            People

            • Assignee:
              shaw Richard Shaw [X] (Inactive)
              Reporter:
              shaw Richard Shaw [X] (Inactive)
              Reviewers:
              Mario Juric, Simon Krughoff, Yusra AlSayyad
              Watchers:
              Mario Juric, Richard Shaw [X] (Inactive), Simon Krughoff, Yusra AlSayyad
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel

                  Time Tracking

                  Estimated:
                  Original Estimate - 10 minutes
                  10m
                  Remaining:
                  Remaining Estimate - 10 minutes
                  10m
                  Logged:
                  Time Spent - Not Specified
                  Not Specified