Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15096

Cannot reuse association database with ApPipeTask reruns

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_pipe, ap_verify
    • Labels:
      None
    • Story Points:
      4
    • Epic Link:
    • Sprint:
      AP F18-2
    • Team:
      Alert Production

      Description

      As discussed in DM-14761, ApPipeTask creates a new copy of an association database whenever a rerun is used, even if a database already exists in one or more parent repositories. This causes partial reprocessing of a data set to return invalid associations because DIAObjects are not handled consistently.

      The root cause is that ApPipeTask tries to always treat the association database as if it were part of the (current run's) output repository. This may be consistent with future Butler support for database operations, but is unlikely to be consistent with how ap_pipe will be run in commissioning/operations at NCSA.

      Based on the discussion at the July 12 AP group meeting, ApPipeTask's behavior will be changed as follows:

      • ApPipeTask will no longer attempt to overwrite its own database config. However, if the config contains the AssociationTask default (a temporary, in-memory database), it will print a warning to the effect that any association results will be lost.
      • The ap_pipe documentation will recommend that users always override the database location to a file location that works for them. (This is a deliberate violation of the sensible-defaults rule for configs, but better than not letting it be configured at all.)
      • ap_verify will give ApPipeTask a config that places the database in ap_verify's workspace (note: not in the output repository itself, which is also in said workspace). Config overrides in obs packages or datasets will take precedence over this setting.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            After discussion with Chris Morrison, ApPipeConfig will reject any config that requests an in-memory database as invalid. The in-memory DB's lack of persistence doesn't work well with the current "call create_tables before making AssociationTask" workflow, and Chris argues that in the long run setting up the database shouldn't be ap_pipe's responsibility. All other behavior will stay as written above.

            Show
            krzys Krzysztof Findeisen added a comment - - edited After discussion with Chris Morrison , ApPipeConfig will reject any config that requests an in-memory database as invalid. The in-memory DB's lack of persistence doesn't work well with the current "call create_tables before making AssociationTask " workflow, and Chris argues that in the long run setting up the database shouldn't be ap_pipe 's responsibility. All other behavior will stay as written above.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Hi Meredith Rawls, can you review the ap_pipe tutorial changes introduced by this ticket? It's only 14 lines.

            Chris Morrison, can you review the rest of the changes to ap_pipe and ap_verify?

            Thanks!

            Show
            krzys Krzysztof Findeisen added a comment - - edited Hi Meredith Rawls , can you review the ap_pipe tutorial changes introduced by this ticket? It's only 14 lines. Chris Morrison , can you review the rest of the changes to ap_pipe and ap_verify ? Thanks!
            Hide
            cmorrison Chris Morrison added a comment -

            Looks good. Happy to see that the amount of code decreases for ap_verify. The one thing to watch out for is that l1dbproto calls is database location config field "db_url" vs "db_name". Meredith Rawls feel free to mark it as reviewed when you are ready.

            Show
            cmorrison Chris Morrison added a comment - Looks good. Happy to see that the amount of code decreases for ap_verify. The one thing to watch out for is that l1dbproto calls is database location config field "db_url" vs "db_name". Meredith Rawls feel free to mark it as reviewed when you are ready.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Ah, I didn't realize that l1dbproto had a configurable location, since I only looked at AssociationL1DBProtoConfig and not l1db.L1dbConfig; I'd assumed it was hardcoded in the l1dbproto package or something.

            Show
            krzys Krzysztof Findeisen added a comment - Ah, I didn't realize that l1dbproto had a configurable location, since I only looked at AssociationL1DBProtoConfig and not l1db.L1dbConfig ; I'd assumed it was hardcoded in the l1dbproto package or something.
            Hide
            mrawls Meredith Rawls added a comment -

            I chatted with Chris Morrison a bit about this change, and while I recognize its necessity since the usual workflow will eventually be to connect to an already-existing database and we can't automagically know its name or location, I find it a bit cumbersome to have a required config parameter with no sensible default. I don't have a good alternative to suggest though. My review mostly just covers the documentation updates. They are mostly good, but please add the clarifications I requested over on GitHub before merging.

            Show
            mrawls Meredith Rawls added a comment - I chatted with Chris Morrison a bit about this change, and while I recognize its necessity since the usual workflow will eventually be to connect to an already-existing database and we can't automagically know its name or location, I find it a bit cumbersome to have a required config parameter with no sensible default. I don't have a good alternative to suggest though. My review mostly just covers the documentation updates. They are mostly good, but please add the clarifications I requested over on GitHub before merging.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I don't see how that's relevant, since AFAIK the "usual workflow" won't use SQLite anyway. Can the three of us maybe talk about the design implications in person? Based on stuff like the db_url question I think we're talking past each other again.

            Show
            krzys Krzysztof Findeisen added a comment - I don't see how that's relevant, since AFAIK the "usual workflow" won't use SQLite anyway. Can the three of us maybe talk about the design implications in person? Based on stuff like the db_url question I think we're talking past each other again.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Thank you both for your reviews!

            Show
            krzys Krzysztof Findeisen added a comment - Thank you both for your reviews!

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                Chris Morrison, Meredith Rawls
                Watchers:
                Chris Morrison, John Swinbank, Krzysztof Findeisen, Meredith Rawls
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: