Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5739

--clobber-config option and message are misleading

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: daf_persistence
    • Labels:
      None
    • Story Points:
      1
    • Team:
      Data Access and Database

      Description

      Using the --clobber-config option in a child butler repository was thought to cause changes in the parent repository having to do with backing up files.

      But actually nothing in the parent is altered in any way.

      The original reporter would suggest the message and command line option for this scenario should be changed to something along the lines of telling the user they can "ignore the config in the input repo" (e.g. --ignore-input-config) as well as omitting the phrase "tasks configurations must be consistent within the same output repo". It would be best here not to even mention "clobbering" of any sort, as none is actually occurring.

      This was originally reported as https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1341

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I'm not sure I understand. --clobber-config is a CmdLineTask option that ends up calling butler.put(..., backup=True). The latter delegates to CameraMapper.backup(), which appears to copy all previous backups in the output repository or any of its _parent-linked repositories into the output repository with increasing ~N suffixes. The code (here) does not appear to be able to modify parent repositories at all.

            Show
            ktl Kian-Tat Lim added a comment - I'm not sure I understand. --clobber-config is a CmdLineTask option that ends up calling butler.put(..., backup=True) . The latter delegates to CameraMapper.backup() , which appears to copy all previous backups in the output repository or any of its _parent -linked repositories into the output repository with increasing ~N suffixes. The code ( here ) does not appear to be able to modify parent repositories at all.
            Hide
            jbosch Jim Bosch added a comment -

            Hmm. That code does indeed look fine, and I can't reproduce the problem with a simple test case right now. But I'm sure I've seen this - it was a while ago, and probably on the HSC side, but the code is identical there. Lauren MacArthur was the one who brought up being worried about this; maybe she has a how-to-reproduce? Or maybe Paul Price has an idea? If neither of them can demonstrate this, I'll close it. And if the bug does exist, it must be more complicated than I thought.

            Show
            jbosch Jim Bosch added a comment - Hmm. That code does indeed look fine, and I can't reproduce the problem with a simple test case right now. But I'm sure I've seen this - it was a while ago, and probably on the HSC side, but the code is identical there. Lauren MacArthur was the one who brought up being worried about this; maybe she has a how-to-reproduce? Or maybe Paul Price has an idea? If neither of them can demonstrate this, I'll close it. And if the bug does exist, it must be more complicated than I thought.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            Indeed, I ran into a situation where I was worried this would happen, so did not take the risk of running with -clobber-config. So I can't reproduce as I never actually confirmed that it would happen. More specifically, I had built a coadd using the LSST stack, but pointing at an HSC processed run (in order to see if one could actually perform post-meas_mosaic, i.e. processed on HSC stack up to that stage, processing using the LSST stack). I got through:
            makeCoaddTempExp.py
            assembleCoadd.py
            detectCoaddSources.py
            mergeCoaddDetections.py

            but then at measureCoaddSources.py I was faced with:

            measureCoaddSources FATAL: Failed in task initialization: New schema does not match schema 'deepCoadd_meas' on disk; schemas must be  consistent within the same output repo (override with --clobber-config)
            

            I was puzzled why my new run should care about any old schemas at all and also why this didn't happen at an earlier stage (i.e. at the detection stage). Regardless, I figured it was looking through the parent tree and found the HSC processed schema, so I was afraid that running with --clobber-config would overwrite that one and I did not want to risk that happening. So, I guess my issue boils down to: "Why is my new multiband processing looking through the parent tree for a schema at all (i.e. since it's a "rerun", shouldn't it be creating it's own schema)?".

            Show
            lauren Lauren MacArthur added a comment - - edited Indeed, I ran into a situation where I was worried this would happen, so did not take the risk of running with - clobber-config . So I can't reproduce as I never actually confirmed that it would happen. More specifically, I had built a coadd using the LSST stack, but pointing at an HSC processed run (in order to see if one could actually perform post - meas_mosaic , i.e. processed on HSC stack up to that stage, processing using the LSST stack). I got through: makeCoaddTempExp.py assembleCoadd.py detectCoaddSources.py mergeCoaddDetections.py but then at measureCoaddSources.py I was faced with: measureCoaddSources FATAL: Failed in task initialization: New schema does not match schema 'deepCoadd_meas' on disk; schemas must be  consistent within the same output repo (override with --clobber-config) I was puzzled why my new run should care about any old schemas at all and also why this didn't happen at an earlier stage (i.e. at the detection stage). Regardless, I figured it was looking through the parent tree and found the HSC processed schema, so I was afraid that running with --clobber-config would overwrite that one and I did not want to risk that happening. So, I guess my issue boils down to: "Why is my new multiband processing looking through the parent tree for a schema at all (i.e. since it's a "rerun", shouldn't it be creating it's own schema)?".
            Hide
            lauren Lauren MacArthur added a comment -

            I have just confirmed that my fears of the input rerun being modified when using --clobber-config were unfounded. The following scenario describes the behavior:

            An exisitng rerun directory, $root/rerun/one contains a set of data that has been processed from raw through multiband. I want to redo processing on the coadds already existing in root/rerun/one with the output written to rerun directory $root/rerun/two. If I try:

            detectCoaddSources.py $root --rerun one:two --id tract=0 patch=0,0 filter=HSC-I^HSC-R
            

            I am faced with the following message:

            detectCoaddSources FATAL: Failed in task initialization: Config does not match existing task config 'detectCoaddSources_config' on disk; tasks configurations must be consistent within the same output repo (override with --clobber-config)
            

            This is where my fear of overwriting things in $root/rerun/one stemmed from, but really it is just a very badly worded message. What actually happens if you run with --clobber-config is that a new config directory is created in $root/rerun/two and a new detect.py config file is written there (i.e. to $root/rerun/two/config/detect.py). In addition, a file $root/rerun/two/config/detect.py~1 is written which is in fact a copy of the original $root/rerun/one/config/detect.py. Nothing in $root/rerun/one is altered in any way.

            I would suggest the message and command line option for this scenario should be changed to something along the lines of telling the user they can "ignore the config in the input repo" (e.g. --ignore-input-config) as well as omitting the phrase "tasks configurations must be consistent within the same output repo". It would be best here not to even mention "clobbering" of any sort, as none is actually occurring.

            Show
            lauren Lauren MacArthur added a comment - I have just confirmed that my fears of the input rerun being modified when using --clobber-config were unfounded. The following scenario describes the behavior: An exisitng rerun directory, $root/rerun/one contains a set of data that has been processed from raw through multiband. I want to redo processing on the coadds already existing in root/rerun/one with the output written to rerun directory $root/rerun/two . If I try: detectCoaddSources.py $root --rerun one:two --id tract=0 patch=0,0 filter=HSC-I^HSC-R I am faced with the following message: detectCoaddSources FATAL: Failed in task initialization: Config does not match existing task config 'detectCoaddSources_config' on disk; tasks configurations must be consistent within the same output repo (override with --clobber-config) This is where my fear of overwriting things in $root/rerun/one stemmed from, but really it is just a very badly worded message. What actually happens if you run with --clobber-config is that a new config directory is created in $root/rerun/two and a new detect.py config file is written there (i.e. to $root/rerun/two/config/detect.py ). In addition, a file $root/rerun/two/config/detect.py~1 is written which is in fact a copy of the original $root/rerun/one/config/detect.py . Nothing in $root/rerun/one is altered in any way. I would suggest the message and command line option for this scenario should be changed to something along the lines of telling the user they can "ignore the config in the input repo" (e.g. --ignore-input-config ) as well as omitting the phrase "tasks configurations must be consistent within the same output repo". It would be best here not to even mention "clobbering" of any sort, as none is actually occurring.
            Hide
            tjenness Tim Jenness added a comment -

            I think we can close this ticket as either invalid or won't fix.

            Show
            tjenness Tim Jenness added a comment - I think we can close this ticket as either invalid or won't fix.
            Hide
            jbosch Jim Bosch added a comment -

            Agreed; all Gen2.

            Show
            jbosch Jim Bosch added a comment - Agreed; all Gen2.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Jim Bosch, John Swinbank, Kian-Tat Lim, Lauren MacArthur, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.