Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24352

Add auto transfer mode to gen3 ingest

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      When converting gen2 repositories to gen3 we currently default to using symlink mode. This means that in-place conversions can't work and also means that a user can't convert to an S3 bucket.

      I think we need a couple of changes that would help a lot with the right decision being made:

      1. Add a new "link" option that tries a hardlink and falls back to symlink
      2. Add an auto mode to posix datastore that will check to see if the file is within the repository and use None if it is, else it will use "link".
      3. Make "auto" an alias for "copy" in S3 datastore.

      With these changes the 2to3 conversion can use "auto" mode and that should do the right thing most of the time.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch the auto modes were a little more involved than I wanted because of the way ingest works but everything seems to work okay. I have converted a gen2 to 3 in obs_lsst using the auto mode default and it works now for in place conversion and also correctly uses hardlinks without having to be told.

            Show
            tjenness Tim Jenness added a comment - Jim Bosch the auto modes were a little more involved than I wanted because of the way ingest works but everything seems to work okay. I have converted a gen2 to 3 in obs_lsst using the auto mode default and it works now for in place conversion and also correctly uses hardlinks without having to be told.
            Hide
            jbosch Jim Bosch added a comment -

            Code looks good, but I'm worried we're choosing default behavior based on what works around our lack of command-line flexibility in the convert script; see PR comments.

            Show
            jbosch Jim Bosch added a comment - Code looks good, but I'm worried we're choosing default behavior based on what works around our lack of command-line flexibility in the convert script; see PR comments.
            Hide
            tjenness Tim Jenness added a comment -

            I think we disagree philosophically. A per-datastore default seems like a good approach to me. S3 always really wants copy but POSIX really should prefer hard links over soft links if the file to be ingested is outside the repository and "just work" if it's inside the repository. The 2to3 conversion script can know if gen3root == gen2root and switch mode accordingly but an auto-mode makes that work regardless.

            I'm not against a command line option to allow people to do explicit move ingests, but I think that there is a lot of gain in having "auto" as the default for it.

            I do think our chained datastore ingest approach is not quite right. In place ingest could work if the other datastore was in-memory only. "move" could work if it was done as copy for each child datastore followed by a delete.

            Show
            tjenness Tim Jenness added a comment - I think we disagree philosophically. A per-datastore default seems like a good approach to me. S3 always really wants copy but POSIX really should prefer hard links over soft links if the file to be ingested is outside the repository and "just work" if it's inside the repository. The 2to3 conversion script can know if gen3root == gen2root and switch mode accordingly but an auto-mode makes that work regardless. I'm not against a command line option to allow people to do explicit move ingests, but I think that there is a lot of gain in having "auto" as the default for it. I do think our chained datastore ingest approach is not quite right. In place ingest could work if the other datastore was in-memory only. "move" could work if it was done as copy for each child datastore followed by a delete.
            Hide
            jbosch Jim Bosch added a comment -

            I am all for per-datastore defaults generally; my concern was only with the file-location-dependent default for PosixDatastore.

            I generally agree about ChainedDatastore not quite feeling right. I've vaguely wondered whether a butler-managed flat collection of Datastores would work better (mostly when thinking about deletion).

            Show
            jbosch Jim Bosch added a comment - I am all for per-datastore defaults generally; my concern was only with the file-location-dependent default for PosixDatastore. I generally agree about ChainedDatastore not quite feeling right. I've vaguely wondered whether a butler-managed flat collection of Datastores would work better (mostly when thinking about deletion).
            Hide
            tjenness Tim Jenness added a comment -

            I've made the following changes:

            1. Add --transferMode option to conversion script, defaulting to "auto"
            2. Now check every dataset and complain if some are inside root and some are outside root.

            Are those changes sufficient to let you accept the change?

            Show
            tjenness Tim Jenness added a comment - I've made the following changes: Add --transferMode option to conversion script, defaulting to "auto" Now check every dataset and complain if some are inside root and some are outside root. Are those changes sufficient to let you accept the change?
            Hide
            jbosch Jim Bosch added a comment -

            Yup, that works for me.

            Show
            jbosch Jim Bosch added a comment - Yup, that works for me.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Jim Bosch
              Watchers:
              Dino Bektesevic, Jim Bosch, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.