Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7197

ingest.py --mode link fails if files already exist

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_tasks
    • Labels:
      None
    • Team:
      Data Release Production

      Description

      When running ingestImages.py using symbolic links you get a failure if the links already exist. As it's quite common to want to re-ingest files, this shouldn't fail (or an option such as --force should be provided).

      The traceback is

      Traceback (most recent call last):
        File "/Users/lsst/products/DarwinX86/anaconda/4.0.0/envs/lsst/opt/lsst/pipe_tasks/bin/ingestImages.py", line 3, in <module>
          IngestTask.parseAndRun()
        File "/Users/lsst/products/DarwinX86/anaconda/4.0.0/envs/lsst/opt/lsst/pipe_tasks/python/lsst/pipe/tasks/ingest.py", line 366, in parseAndRun
          task.run(args)
        File "/Users/lsst/products/DarwinX86/anaconda/4.0.0/envs/lsst/opt/lsst/pipe_tasks/python/lsst/pipe/tasks/ingest.py", line 459, in run
          ingested = self.ingest(infile, outfile, mode=args.mode, dryrun=args.dryrun)
        File "/Users/lsst/products/DarwinX86/anaconda/4.0.0/envs/lsst/opt/lsst/pipe_tasks/python/lsst/pipe/tasks/ingest.py", line 397, in ingest
          os.symlink(os.path.abspath(infile), outfile)
      OSError: [Errno 17] File exists
      

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            There are several config options that allow you to do what you want:

            • clobber: Clobber existing file?
            • allowError: Allow error in ingestion?
            • register.ignore: Ignore duplicates in the table?
            Show
            price Paul Price added a comment - There are several config options that allow you to do what you want: clobber : Clobber existing file? allowError : Allow error in ingestion? register.ignore : Ignore duplicates in the table?
            Hide
            rhl Robert Lupton added a comment -

            I shouldn't have to clobber to make links work — that's to protect real files. I'm not asking for duplicates, I'm just reingesting the same file, and I don't think that that should be treated as an error.

            So I think I'd expect this to work; but I take back the --force proposal as there are equivalent mechanisms if people feel that the current behaviour is correct.

            This is not a big deal, but quality of implementation matters and this is an unnecessary extra aggravation for users (especially naïve users who don't do this very often).

            Show
            rhl Robert Lupton added a comment - I shouldn't have to clobber to make links work — that's to protect real files. I'm not asking for duplicates, I'm just reingesting the same file, and I don't think that that should be treated as an error. So I think I'd expect this to work; but I take back the --force proposal as there are equivalent mechanisms if people feel that the current behaviour is correct. This is not a big deal, but quality of implementation matters and this is an unnecessary extra aggravation for users (especially naïve users who don't do this very often).
            Hide
            price Paul Price added a comment -

            So what about something like this (the code is on tickets/DM-7197 if you want to try it):

            price@price-laptop:~/LSST/pipe/tasks (tickets/DM-7197=) $ git sub-patch
            commit 518bb300ce2349f709ba9c8bb8d9a5c0f57f5d79
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Thu Aug 11 11:45:13 2016 -0400
             
                ingest: detect when we are re-linking the same file
                
                This should make re-ingesting data less painful.
             
            diff --git a/python/lsst/pipe/tasks/ingest.py b/python/lsst/pipe/tasks/ingest.py
            index f0b6de1..dd47758 100644
            --- a/python/lsst/pipe/tasks/ingest.py
            +++ b/python/lsst/pipe/tasks/ingest.py
            @@ -394,6 +394,13 @@ class IngestTask(Task):
                             assertCanCopy(infile, outfile)
                             shutil.copyfile(infile, outfile)
                         elif mode == "link":
            +                if os.path.exists(outfile):
            +                    if os.path.samefile(infile, outfile):
            +                        self.log.debug("Already linked %s to %s: ignoring" % (infile, outfile))
            +                    else:
            +                        self.log.warn("%s already has a file at the target location (%s): ignoring "
            +                                      "(set clobber=True to overwrite)" % (infile, outfile))
            +                    return False
                             os.symlink(os.path.abspath(infile), outfile)
                         elif mode == "move":
                             assertCanCopy(infile, outfile)
            

            Show
            price Paul Price added a comment - So what about something like this (the code is on tickets/ DM-7197 if you want to try it): price@price-laptop:~/LSST/pipe/tasks (tickets/DM-7197=) $ git sub-patch commit 518bb300ce2349f709ba9c8bb8d9a5c0f57f5d79 Author: Paul Price <price@astro.princeton.edu> Date: Thu Aug 11 11:45:13 2016 -0400   ingest: detect when we are re-linking the same file This should make re-ingesting data less painful.   diff --git a/python/lsst/pipe/tasks/ingest.py b/python/lsst/pipe/tasks/ingest.py index f0b6de1..dd47758 100644 --- a/python/lsst/pipe/tasks/ingest.py +++ b/python/lsst/pipe/tasks/ingest.py @@ -394,6 +394,13 @@ class IngestTask(Task): assertCanCopy(infile, outfile) shutil.copyfile(infile, outfile) elif mode == "link": + if os.path.exists(outfile): + if os.path.samefile(infile, outfile): + self.log.debug("Already linked %s to %s: ignoring" % (infile, outfile)) + else: + self.log.warn("%s already has a file at the target location (%s): ignoring " + "(set clobber=True to overwrite)" % (infile, outfile)) + return False os.symlink(os.path.abspath(infile), outfile) elif mode == "move": assertCanCopy(infile, outfile)
            Hide
            swinbank John Swinbank added a comment -

            Given that there's been no follow-up on this for 3.5 years, I think we can assume it's not a pressing problem and close as won't fix. Robert, if you want to resurrect Paul's patch above, please reopen.

            Show
            swinbank John Swinbank added a comment - Given that there's been no follow-up on this for 3.5 years, I think we can assume it's not a pressing problem and close as won't fix. Robert, if you want to resurrect Paul's patch above, please reopen.
            Hide
            rhl Robert Lupton added a comment -

            That's fine.  The Gen3 ingest does need to be more user friendly about things like this, but it's a different codebase.

            Show
            rhl Robert Lupton added a comment - That's fine.  The Gen3 ingest does need to be more user friendly about things like this, but it's a different codebase.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              rhl Robert Lupton
              Watchers:
              John Swinbank, Paul Price, Robert Lupton
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.