Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22508

Migrate Database interface from prototype to master and add implementations

    Details

      Description

      Move the Database ABC from the prototype to a permanent home in daf_butler, and add concrete implementations for SQLite, Oracle, and PostgreSQL, with unit tests.

       

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Andy Salnikov, I'm afraid you're probably the natural reviewer for most of this code, though I'd love to get extra eyes on some parts of it:

            • Dino Bektesevic: the PostgresqlDatabase class (daf/butler/registry/databases/postgresql.py, tests/test_postgresql.py)
            • Michelle Gower/Christopher Stephens: the OracleDatabase class (daf/butler/registry/databases/oracle.py, tests/test_oracle.py; or just the last commit)

            I'd also like all reviewers to read the PR description of the testing strategy; I'm particularly interested in whether (A) that misses anything that might be interestingly different about PostgreSQL on AWS (or other cloud providers) and (B) whether anyone has any ideas for how to improve the approach I've taken for testing against Oracle.

            For some background on where this is going, see https://confluence.lsstcorp.org/display/DM/Architectural+Prototype+for+the+New+Gen3+Registry .

            I don't want to assume Kian-Tat Lim has time to review code, but this is also the interface under which database-engine-specific dragons should be hidden, and I figure that might pique his interest.

            Show
            jbosch Jim Bosch added a comment - Andy Salnikov , I'm afraid you're probably the natural reviewer for most of this code, though I'd love to get extra eyes on some parts of it: Dino Bektesevic : the PostgresqlDatabase class (daf/butler/registry/databases/postgresql.py, tests/test_postgresql.py) Michelle Gower / Christopher Stephens : the OracleDatabase class (daf/butler/registry/databases/oracle.py, tests/test_oracle.py; or just the last commit) I'd also like all reviewers to read the PR description of the testing strategy; I'm particularly interested in whether (A) that misses anything that might be interestingly different about PostgreSQL on AWS (or other cloud providers) and (B) whether anyone has any ideas for how to improve the approach I've taken for testing against Oracle. For some background on where this is going, see https://confluence.lsstcorp.org/display/DM/Architectural+Prototype+for+the+New+Gen3+Registry . I don't want to assume Kian-Tat Lim has time to review code, but this is also the interface under which database-engine-specific dragons should be hidden, and I figure that might pique his interest.
            Hide
            salnikov Andy Salnikov added a comment -

            Looks OK in general, I left small bunch of comments on PR. Removing myself from reviewers.

            Show
            salnikov Andy Salnikov added a comment - Looks OK in general, I left small bunch of comments on PR. Removing myself from reviewers.
            Hide
            cs2018 Christopher Stephens added a comment -

            As an alternative to the Oracle approach of appending a random string to each object name allowing for multiple repositories in the same schema, we could do the following:

            Create a dedicated unit testing database w/ no recovery guarantees to minimize any overhead negatively affecting other workloads. Create N schemas in that database. For each user wanting to perform unit tests, provide a wallet with credentials to some subset of N schemas or a single credential with proxy authentication privileges to schema subset. DAF_BUTLER_ORACLE_TEST_URI would need to change between tests to allow for parallel testing. Users would also need to manage TNS_ADMIN to ensure the right wallet is being used when performing unit tests vs using a Butler repo for other reasons. Other than that, I don't think any additional coding changes would be required.

            We'd want to revisit this down the road at some point in case more appropriate testing methods become available.

            Show
            cs2018 Christopher Stephens added a comment - As an alternative to the Oracle approach of appending a random string to each object name allowing for multiple repositories in the same schema, we could do the following: Create a dedicated unit testing database w/ no recovery guarantees to minimize any overhead negatively affecting other workloads. Create N schemas in that database. For each user wanting to perform unit tests, provide a wallet with credentials to some subset of N schemas or a single credential with proxy authentication privileges to schema subset. DAF_BUTLER_ORACLE_TEST_URI would need to change between tests to allow for parallel testing. Users would also need to manage TNS_ADMIN to ensure the right wallet is being used when performing unit tests vs using a Butler repo for other reasons. Other than that, I don't think any additional coding changes would be required. We'd want to revisit this down the road at some point in case more appropriate testing methods become available.
            Hide
            jbosch Jim Bosch added a comment -

            Christopher Stephens, that sounds great, but I do have some questions.  If I'm understanding the idea correctly, I think we'd need to have a programmatic way to generate those DAF_BUTLER_ORACLE_TEST_URI values, and to avoid clashes it seems like we'd need to have them managed by some kind of lock.  I assume we could invent some way to do that via a table in the testing database that could be queried for available test schemas.  Or were you thinking that we'd just assign schema subsets to users in advance, and let users take care not to run multiple test suites in parallel?

            In case it helps, we may be able to get away with doing all of the testing in a temporary tablespace, depending on how SQLAlchemy engine/connection object lifetimes correspond to Oracle sessions.

            I assume Oracle schemas and users are too tightly coupled for a testing database where users have the ability to do CREATE SCHEMA to work?

            Show
            jbosch Jim Bosch added a comment - Christopher Stephens , that sounds great, but I do have some questions.  If I'm understanding the idea correctly, I think we'd need to have a programmatic way to generate those DAF_BUTLER_ORACLE_TEST_URI values, and to avoid clashes it seems like we'd need to have them managed by some kind of lock.  I assume we could invent some way to do that via a table in the testing database that could be queried for available test schemas.  Or were you thinking that we'd just assign schema subsets to users in advance, and let users take care not to run multiple test suites in parallel? In case it helps, we may be able to get away with doing all of the testing in a temporary tablespace, depending on how SQLAlchemy engine/connection object lifetimes correspond to Oracle sessions. I assume Oracle schemas and users are too tightly coupled for a testing database where users have the ability to do CREATE SCHEMA to work?
            Hide
            cs2018 Christopher Stephens added a comment -

            I don't think there is much of an advantage in trying to manage these test schemas globally. There's little to no overhead associated with empty, unused schemas.  I was thinking we could agree on a standard number of schemas/credentials for each user wanting to perform unit tests against Oracle and leave it to them to manage DAF_BUTLER_ORACLE_TEST_URI for the credentials we place in their unit testing wallets.

            Your suggestion for making use of low overhead/temporary tablespaces is exactly why I mentioned no recoverability guarantees. Great thought!  We can't place traditional heap tables in temporary tablespaces but we can run the database in noarchivelog mode to reduce database I/O globally and limit any intrusion on other workloads in the cluster. 

            You assumption is correct. We'd have a hard time managing an environment where users have the ability to create schemas on demand. I think pre-creating a bunch of schemas for each unit testing user and loading a wallet w/ those credentials is a more stable/manageable approach. If it turns out a user wants more schemas and its justified we can always just add those credentials to the existing wallet at a later time.

            Show
            cs2018 Christopher Stephens added a comment - I don't think there is much of an advantage in trying to manage these test schemas globally. There's little to no overhead associated with empty, unused schemas.  I was thinking we could agree on a standard number of schemas/credentials for each user wanting to perform unit tests against Oracle and leave it to them to manage DAF_BUTLER_ORACLE_TEST_URI for the credentials we place in their unit testing wallets. Your suggestion for making use of low overhead/temporary tablespaces is exactly why I mentioned no recoverability guarantees. Great thought!  We can't place traditional heap tables in temporary tablespaces but we can run the database in noarchivelog mode to reduce database I/O globally and limit any intrusion on other workloads in the cluster.  You assumption is correct. We'd have a hard time managing an environment where users have the ability to create schemas on demand. I think pre-creating a bunch of schemas for each unit testing user and loading a wallet w/ those credentials is a more stable/manageable approach. If it turns out a user wants more schemas and its justified we can always just add those credentials to the existing wallet at a later time.
            Hide
            jbosch Jim Bosch added a comment -

            (Marked reviewed on GH)

            Show
            jbosch Jim Bosch added a comment - (Marked reviewed on GH)

              People

              • Assignee:
                jbosch Jim Bosch
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Christopher Stephens, Dino Bektesevic, Michelle Gower
                Watchers:
                Andy Salnikov, Christopher Stephens, Dino Bektesevic, Jim Bosch, Michelle Gower
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel