Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17679

Butler sqlite data issues

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      sqlite doesn't appear to enforce column datatype declarations or length limits. in the most recent Gen3 sqlite file I found the following issues:

      sqlite> .schema Dataset
      CREATE TABLE IF NOT EXISTS "Dataset" (
      dataset_id INTEGER NOT NULL,
      dataset_type_name VARCHAR(128) NOT NULL,
      run_id INTEGER NOT NULL,
      quantum_id INTEGER,
      dataset_ref_hash VARCHAR(44) NOT NULL,
      instrument VARCHAR(8),
      abstract_filter VARCHAR(8),
      physical_filter VARCHAR(16),
      detector INTEGER,
      visit INTEGER,
      exposure INTEGER,
      valid_first DATETIME,
      valid_last DATETIME,
      skypix INTEGER,
      skymap VARCHAR(32),
      tract INTEGER,
      patch INTEGER,
      label VARCHAR(32),
      PRIMARY KEY (dataset_id),
      FOREIGN KEY(dataset_type_name) REFERENCES "DatasetType" (dataset_type_name),
      FOREIGN KEY(run_id) REFERENCES "Run" (execution_id),
      FOREIGN KEY(quantum_id) REFERENCES "Quantum" (execution_id),
      FOREIGN KEY(instrument) REFERENCES "Instrument" (instrument),
      FOREIGN KEY(instrument, physical_filter) REFERENCES "PhysicalFilter" (instrument, physical_filter),
      FOREIGN KEY(instrument, detector) REFERENCES "Detector" (instrument, detector),
      FOREIGN KEY(instrument, visit) REFERENCES "Visit" (instrument, visit),
      FOREIGN KEY(instrument, exposure) REFERENCES "Exposure" (instrument, exposure),
      FOREIGN KEY(skymap) REFERENCES "SkyMap" (skymap),
      FOREIGN KEY(skymap, tract) REFERENCES "Tract" (skymap, tract),
      FOREIGN KEY(skymap, tract, patch) REFERENCES "Patch" (skymap, tract, patch)
      );

      sqlite> select distinct(skypix) from Dataset;
      189584
      189648
      master_schema

      sqlite> .schema DatasetType

      CREATE TABLE IF NOT EXISTS "DatasetType" ( dataset_type_name VARCHAR(128) NOT NULL, storage_class VARCHAR(32) NOT NULL, PRIMARY KEY (dataset_type_name));

      sqlite> select distinct(length(storage_class)) from DatasetType where length(storage_class) > 32;

      33

        Attachments

          Activity

          Hide
          jbosch Jim Bosch added a comment -

          Michelle Gower, I think I've got this fixed.  Mind reviewing?

          Christopher Stephens, I've attached the fixed gen3.sqlite3 to this ticket if you'd like to verify it.

          Show
          jbosch Jim Bosch added a comment - Michelle Gower , I think I've got this fixed.  Mind reviewing? Christopher Stephens , I've attached the fixed gen3.sqlite3 to this ticket if you'd like to verify it.
          Hide
          jbosch Jim Bosch added a comment -

          Looks like Jira isn't seeing the PR.  Here it is: https://github.com/lsst/daf_butler/pull/131

          Show
          jbosch Jim Bosch added a comment - Looks like Jira isn't seeing the PR.  Here it is: https://github.com/lsst/daf_butler/pull/131
          Hide
          cs2018 Christopher Stephens added a comment -

          quick check shows everything looks good. i'll import this into Oracle on Monday and let you know if we hit any issues.

          Thanks for quick turn around!

          Show
          cs2018 Christopher Stephens added a comment - quick check shows everything looks good. i'll import this into Oracle on Monday and let you know if we hit any issues. Thanks for quick turn around!
          Hide
          mgower Michelle Gower added a comment -

          I didn't see code to check that the value will fit in the column (e.g., length of the string).    So we may run into a few more of the width problems in the future.   Since easy to work around in DDL, probably not a high coding priority right now.   Otherwise good to go.

          Show
          mgower Michelle Gower added a comment - I didn't see code to check that the value will fit in the column (e.g., length of the string).    So we may run into a few more of the width problems in the future.   Since easy to work around in DDL, probably not a high coding priority right now.   Otherwise good to go.
          Hide
          jbosch Jim Bosch added a comment -

          I didn't see code to check that the value will fit in the column (e.g., length of the string). So we may run into a few more of the width problems in the future.

          Correct.  Given that both Python and SQLite really don't lend themselves well to type-checking (let alone length-checking), I'm wary of trying to do too much of that ourselves, even though I'd really rather be working with a language/DB with strong typing.

          I've got a Jenkins run going now, so unless there are surprises I should be able to get this into the next weekly.

          Show
          jbosch Jim Bosch added a comment - I didn't see code to check that the value will fit in the column (e.g., length of the string). So we may run into a few more of the width problems in the future. Correct.  Given that both Python and SQLite really don't lend themselves well to type-checking (let alone length-checking), I'm wary of trying to do too much of that ourselves, even though I'd really rather be working with a language/DB with strong typing. I've got a Jenkins run going now, so unless there are surprises I should be able to get this into the next weekly.
          Hide
          cs2018 Christopher Stephens added a comment -

          Import into Oracle was successful w/ the exception of PosixDatastoreRecords which is a known issue.

          SQLite ddl dump for that table still doesn't have lengths associated w/ VARCHAR columns.

           

          CREATE TABLE "PosixDatastoreRecords" (
          path VARCHAR,
          formatter VARCHAR,
          storage_class VARCHAR,
          file_size INTEGER,
          checksum VARCHAR,
          dataset_id INTEGER NOT NULL,
          PRIMARY KEY (dataset_id)
          );

          Show
          cs2018 Christopher Stephens added a comment - Import into Oracle was successful w/ the exception of PosixDatastoreRecords which is a known issue. SQLite ddl dump for that table still doesn't have lengths associated w/ VARCHAR columns.   CREATE TABLE "PosixDatastoreRecords" ( path VARCHAR, formatter VARCHAR, storage_class VARCHAR, file_size INTEGER, checksum VARCHAR, dataset_id INTEGER NOT NULL, PRIMARY KEY (dataset_id) );
          Hide
          jbosch Jim Bosch added a comment -

          Yeah, I suspect there isn't a way to get the lengths out of SQLite; I'm pretty sure it just completely ignores them on input (maybe even up in the parser).

          I'd really like to switch to generating the Oracle schema directly via SQLAlchemy calls, but I need to do DM-17154 first so we can have a way to customize those SQLAlchemy calls for different DBs.  Unfortunately I have a chain of blocker tickets that really ought to be done first.

          Show
          jbosch Jim Bosch added a comment - Yeah, I suspect there isn't a way to get the lengths out of SQLite; I'm pretty sure it just completely ignores them on input (maybe even up in the parser). I'd really like to switch to generating the Oracle schema directly via SQLAlchemy calls, but I need to do DM-17154 first so we can have a way to customize those SQLAlchemy calls for different DBs.  Unfortunately I have a chain of blocker tickets that really ought to be done first.

            People

            • Assignee:
              jbosch Jim Bosch
              Reporter:
              cs2018 Christopher Stephens
              Reviewers:
              Michelle Gower
              Watchers:
              Christopher Stephens, Jim Bosch, Michelle Gower
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel