Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23342

Fix ingestDriver

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • pipe_drivers
    • External
    • No

    Description

      DM-23213 broke ingestDriver.py by changing the API of IngestTask.runFile.

      Attachments

        Activity

          price Paul Price added a comment -

          ktl, could you please check that I haven't messed up what you were trying to achieve?

          I've thrown in my DM-7197 fix for good measure.

          price Paul Price added a comment - ktl , could you please check that I haven't messed up what you were trying to achieve? I've thrown in my DM-7197 fix for good measure.
          ktl Kian-Tat Lim added a comment - - edited

          Sorry about not recognizing that ingestDriver would be affected.

          In a way, this change is undoing what I did: it's restoring the (now-ignored) return value from runFile and explicitly enabling a single-point, serialized registry update.

          I'm not sure it's still a great idea to do all the reading and filesystem work separate from the registry work.  I seem to recall ingest being dominated by the former (and in particular parsing and converting metadata), in which case the locking involved with multiple parallel database connections wouldn't be significant.  Adding to the registry after each file would allow partial ingests to be recorded, so future re-ingests can avoid any filesystem operations for those files.  (And it allows Butler clients to see datasets as they are ingested, although this is likely less of an issue for the large-scale precursor data ingests for which I believe ingestDriver is designed.)

          It might be that the difficult point is getting the registry connection to each pool member, or perhaps adequate shared filesystem locking for the registry cannot be ensured.  In that case, this does appear to be the minimal change to keep things working as they were.  (There's one suggestion in the PR for a slight safety improvement.)

          ktl Kian-Tat Lim added a comment - - edited Sorry about not recognizing that ingestDriver would be affected. In a way, this change is undoing what I did: it's restoring the (now-ignored) return value from runFile and explicitly enabling a single-point, serialized registry update. I'm not sure it's still a great idea to do all the reading and filesystem work separate from the registry work.  I seem to recall ingest being dominated by the former (and in particular parsing and converting metadata), in which case the locking involved with multiple parallel database connections wouldn't be significant.  Adding to the registry after each file would allow partial ingests to be recorded, so future re-ingests can avoid any filesystem operations for those files.  (And it allows Butler clients to see datasets as they are ingested, although this is likely less of an issue for the large-scale precursor data ingests for which I believe ingestDriver is designed.) It might be that the difficult point is getting the registry connection to each pool member, or perhaps adequate shared filesystem locking for the registry cannot be ensured.  In that case, this does appear to be the minimal change to keep things working as they were.  (There's one suggestion in the PR for a slight safety improvement.)
          price Paul Price added a comment -

          Thanks KT. I fixed up the issue you identified, and I'm running Jenkins.

          price Paul Price added a comment - Thanks KT. I fixed up the issue you identified, and I'm running Jenkins.
          price Paul Price added a comment -

          Jenkins is green.

          Merged to master.

          price Paul Price added a comment - Jenkins is green . Merged to master.

          People

            price Paul Price
            price Paul Price
            Kian-Tat Lim
            Kian-Tat Lim, Paul Price
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Jenkins

                No builds found.