Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-37824

Add obscore table to embargo repo at USDF

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None

      Description

      We now have an initial/preliminary obscore configuration for embargo repo, so we can go ahead with adding obscore table based on that. Actual job is done using migration scripts in daf_butler_migrate to create the table and butler command in dax_obscore to populate that with pre-existing data.

      I want to test this on a copy of embargo repo to make sure (once more) that migration scripts work OK. I also think that migration scripts were written for namespace=oga, and new config uses namespace=embargo (I think embargo is better choice), so likely I'll need to update migration scripts as well.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Testing obscore migration on a copy of embargo repo on my test server.

            Starting from a clean state:

            $ butler migrate show-current $REPO
            c5ae3a2cd7c2 (head)
            f22a777cf382 (head)
            035dcf13ef18 (head)
            77e5b803ad3f (head)
            a07b3b60e369 (head)
            2101fbf51ad3 (head)
            8c57494cabcc (head)
             
            $ butler migrate show-current --butler $REPO
            attributes: lsst.daf.butler.registry.attributes.DefaultButlerAttributeManager 1.0.0 -> f22a777cf382
            collections: lsst.daf.butler.registry.collections.synthIntKey.SynthIntKeyCollectionManager 2.0.0 -> 8c57494cabcc
            datasets: lsst.daf.butler.registry.datasets.byDimensions._manager.ByDimensionsDatasetRecordStorageManagerUUID 1.0.0 -> 2101fbf51ad3
            datastores: lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager 0.2.0 -> a07b3b60e369
            dimensions: lsst.daf.butler.registry.dimensions.static.StaticDimensionRecordStorageManager 6.0.2 -> 035dcf13ef18
            dimensions-config: daf_butler 3 -> c5ae3a2cd7c2
            opaque: lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager 0.2.0 -> 77e5b803ad3f
            

            So need to apply migration for both obscore and obscore-config managers. Start with obscore:

            $ butler migrate show-history obscore
            8b8e030aba2b -> a0d766f99876 (obscore-ObsCoreLiveTableManager, obscore) (head), Migration script for ObsCoreLiveTableManager 0.0.1.
            <base> -> 8b8e030aba2b (obscore-ObsCoreLiveTableManager, obscore), This is an initial pseudo-revision of the 'obscore' tree.
            

            Add it to alembic:

            $ butler migrate stamp $REPO obscore
            

            And migrate (this only updates butler_attributes):

            $ butler migrate upgrade $REPO a0d766f99876
            *** Do not forget to add this line to butler.yaml file (registry.managers):
            ***     obscore: lsst.daf.butler.registry.obscore._manager.ObsCoreLiveTableManager
            

            Edit butler.yaml and add above line.

            Add butler-config to alembic:

            $ butler migrate stamp $REPO obscore-config
            

            And migrate (add obscore table), this needs an option for config file location:

            $ butler migrate upgrade --options obscore_config=../dax_obscore/configs/usdf-embargo-live.yaml $REPO/butler.yaml 4fe28ef5030f
            *** Before anything can be written the upgraded Registry you need to run
            *** `butler obscore update-table` command (defined in dax_obscore package).
            

            $ butler migrate show-current $REPO
            a0d766f99876 (head)
            035dcf13ef18 (head)
            f22a777cf382 (head)
            77e5b803ad3f (head)
            a07b3b60e369 (head)
            2101fbf51ad3 (head)
            8c57494cabcc (head)
            4fe28ef5030f (head)
            c5ae3a2cd7c2 (head)
            

            Now populate it with the existing data:

            $ time butler obscore update-table $REPO
            lsst.dax.obscore.script.obscore_update_table INFO: Found these dataset types in registry: ['raw', 'calexp', 'quickLookExp']
            lsst.dax.obscore.script.obscore_update_table INFO: Added 34442 records for dataset type 'raw' and collection 'LATISS/raw/all' in 45 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 26807 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221202T163114Z' in 35 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 62 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221208T010235Z' in 0 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 97 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221209T014939Z' in 0 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 6 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221209T015505Z' in 0 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 3342 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230110T184842Z' in 5 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 2364 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230122T121549Z' in 3 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 40 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T130220Z' in 0 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 1156 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T131634Z' in 2 seconds
            lsst.dax.obscore.script.obscore_update_table INFO: Added 93 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T163949Z' in 0 seconds
            real    1m30.454s
            user    1m11.914s
            sys     0m2.183s
            

            Show
            salnikov Andy Salnikov added a comment - Testing obscore migration on a copy of embargo repo on my test server. Starting from a clean state: $ butler migrate show-current $REPO c5ae3a2cd7c2 (head) f22a777cf382 (head) 035dcf13ef18 (head) 77e5b803ad3f (head) a07b3b60e369 (head) 2101fbf51ad3 (head) 8c57494cabcc (head)   $ butler migrate show-current --butler $REPO attributes: lsst.daf.butler.registry.attributes.DefaultButlerAttributeManager 1.0.0 -> f22a777cf382 collections: lsst.daf.butler.registry.collections.synthIntKey.SynthIntKeyCollectionManager 2.0.0 -> 8c57494cabcc datasets: lsst.daf.butler.registry.datasets.byDimensions._manager.ByDimensionsDatasetRecordStorageManagerUUID 1.0.0 -> 2101fbf51ad3 datastores: lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager 0.2.0 -> a07b3b60e369 dimensions: lsst.daf.butler.registry.dimensions.static.StaticDimensionRecordStorageManager 6.0.2 -> 035dcf13ef18 dimensions-config: daf_butler 3 -> c5ae3a2cd7c2 opaque: lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager 0.2.0 -> 77e5b803ad3f So need to apply migration for both obscore and obscore-config managers. Start with obscore: $ butler migrate show-history obscore 8b8e030aba2b -> a0d766f99876 (obscore-ObsCoreLiveTableManager, obscore) (head), Migration script for ObsCoreLiveTableManager 0.0.1. <base> -> 8b8e030aba2b (obscore-ObsCoreLiveTableManager, obscore), This is an initial pseudo-revision of the 'obscore' tree. Add it to alembic: $ butler migrate stamp $REPO obscore And migrate (this only updates butler_attributes): $ butler migrate upgrade $REPO a0d766f99876 *** Do not forget to add this line to butler.yaml file (registry.managers): *** obscore: lsst.daf.butler.registry.obscore._manager.ObsCoreLiveTableManager Edit butler.yaml and add above line. Add butler-config to alembic: $ butler migrate stamp $REPO obscore-config And migrate (add obscore table), this needs an option for config file location: $ butler migrate upgrade --options obscore_config=../dax_obscore/configs/usdf-embargo-live.yaml $REPO/butler.yaml 4fe28ef5030f *** Before anything can be written the upgraded Registry you need to run *** `butler obscore update-table` command (defined in dax_obscore package). $ butler migrate show-current $REPO a0d766f99876 (head) 035dcf13ef18 (head) f22a777cf382 (head) 77e5b803ad3f (head) a07b3b60e369 (head) 2101fbf51ad3 (head) 8c57494cabcc (head) 4fe28ef5030f (head) c5ae3a2cd7c2 (head) Now populate it with the existing data: $ time butler obscore update-table $REPO lsst.dax.obscore.script.obscore_update_table INFO: Found these dataset types in registry: ['raw', 'calexp', 'quickLookExp'] lsst.dax.obscore.script.obscore_update_table INFO: Added 34442 records for dataset type 'raw' and collection 'LATISS/raw/all' in 45 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 26807 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221202T163114Z' in 35 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 62 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221208T010235Z' in 0 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 97 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221209T014939Z' in 0 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 6 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20221209T015505Z' in 0 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 3342 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230110T184842Z' in 5 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 2364 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230122T121549Z' in 3 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 40 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T130220Z' in 0 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 1156 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T131634Z' in 2 seconds lsst.dax.obscore.script.obscore_update_table INFO: Added 93 records for dataset type 'quickLookExp' and collection 'LATISS/runs/quickLook/20230201T163949Z' in 0 seconds real 1m30.454s user 1m11.914s sys 0m2.183s
            Hide
            salnikov Andy Salnikov added a comment -

            I think everything should be ready for migrating embargo at USDF. Here is the plan:

            • Agree on time slot, 10-15 minutes of a relatively quiet period, to avoid any potential issues.
            • I need to ask someone (Kian-Tat Lim?) to update butler.yaml with a new manager line.
            • Run all above commands at USDF.

            I can test basic things after that but to make sure everything works and obscore is updated correctly, we need to ingest some new data and check obscore.

            Show
            salnikov Andy Salnikov added a comment - I think everything should be ready for migrating embargo at USDF. Here is the plan: Agree on time slot, 10-15 minutes of a relatively quiet period, to avoid any potential issues. I need to ask someone ( Kian-Tat Lim ?) to update butler.yaml with a new manager line. Run all above commands at USDF. I can test basic things after that but to make sure everything works and obscore is updated correctly, we need to ingest some new data and check obscore.
            Hide
            salnikov Andy Salnikov added a comment -

            I ran migration on USDF repo, notes are here: https://confluence.lsstcorp.org/display/~salnikov/2023-02-03+Adding+obscore+to+USDF+embargo+repo

            There was something unexpected during the migration - when a script tried to create a new table it got stuck on a lock (likely metadata tables). There was a transaction with INSERT query that was "running" for some time and did not want to finish, and it was locking something. It was in a "idle in transaction" state and ClientRead event, so it looked like client did not want to read data. It is likely that this query was from a job that Merlin killed shortly before I started my process. My guess this is probably due to a proxy which did not destroy the connection (and did not kill transaction) and server was still thinking that client is not dead yet. Not sure if this is going to be an issue in general.

            Show
            salnikov Andy Salnikov added a comment - I ran migration on USDF repo, notes are here: https://confluence.lsstcorp.org/display/~salnikov/2023-02-03+Adding+obscore+to+USDF+embargo+repo There was something unexpected during the migration - when a script tried to create a new table it got stuck on a lock (likely metadata tables). There was a transaction with INSERT query that was "running" for some time and did not want to finish, and it was locking something. It was in a "idle in transaction" state and ClientRead event, so it looked like client did not want to read data. It is likely that this query was from a job that Merlin killed shortly before I started my process. My guess this is probably due to a proxy which did not destroy the connection (and did not kill transaction) and server was still thinking that client is not dead yet. Not sure if this is going to be an issue in general.
            Hide
            salnikov Andy Salnikov added a comment -

            And I see that new records are being added to obscore table, looks like it's doing something right.

            Show
            salnikov Andy Salnikov added a comment - And I see that new records are being added to obscore table, looks like it's doing something right.
            Hide
            salnikov Andy Salnikov added a comment -

            A small change was needed to a migration script in daf_butler_migrate to change namespace from "oga" to "embargo". Self-reviewed and merged.

            Show
            salnikov Andy Salnikov added a comment - A small change was needed to a migration script in daf_butler_migrate to change namespace from "oga" to "embargo". Self-reviewed and merged.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Do you have a suggestion for how to poke at the data that are accumulating? I could probably figure out how to do something with SQLAlchemy from a USDF-RSP notebook, if that connection is allowed.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Do you have a suggestion for how to poke at the data that are accumulating? I could probably figure out how to do something with SQLAlchemy from a USDF-RSP notebook, if that connection is allowed.
            Hide
            salnikov Andy Salnikov added a comment -

            A direct SQL query is probably the only way until ObsTAP is there. sqlalchemy should work OK from USDF, I think authentication should be already configured for you (if not then you need to create .pgpass file). Embargo repo is in oga schema, other than that it should be easy to write simple select(). Let me know if you need an example.

            Show
            salnikov Andy Salnikov added a comment - A direct SQL query is probably the only way until ObsTAP is there. sqlalchemy should work OK from USDF, I think authentication should be already configured for you (if not then you need to create .pgpass file). Embargo repo is in oga schema, other than that it should be easy to write simple select(). Let me know if you need an example.
            Hide
            salnikov Andy Salnikov added a comment -

            And another approach, which I frequently to use personally, is some other database tool, e.g. DBeaver, and connect to the server directly and run SQL queries. You may need to set SSH tunnel for that if you are outside SLAC.

            Show
            salnikov Andy Salnikov added a comment - And another approach, which I frequently to use personally, is some other database tool, e.g. DBeaver, and connect to the server directly and run SQL queries. You may need to set SSH tunnel for that if you are outside SLAC.
            Hide
            salnikov Andy Salnikov added a comment -

            Closing this as obscore is created, and it is regularly updated. We'll need to extend ingest process (needs to update raw regions after ingesting them), but it will need a separate ticket.

            Show
            salnikov Andy Salnikov added a comment - Closing this as obscore is created, and it is regularly updated. We'll need to extend ingest process (needs to update raw regions after ingesting them), but it will need a separate ticket.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Christine Banek, Gregory Dubois-Felsmann, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.