Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26630

Column size for datastore filename is too short

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Simon Krughoff reported that he was ending up with filenames that were too long for the column definition in the datastore records table.

      Currently the file path is 256 characters but we have lots of directories in the default template and the collection appears twice. A collection can be 64 characters so half the path can be taken up solely by the collection.

      Consider using TEXT for this field rather than going to VARCHAR(512).

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            TEXT seems to do the right thing on sqlite. Not sure how to test the branch on postgres.

            Show
            tjenness Tim Jenness added a comment - TEXT seems to do the right thing on sqlite. Not sure how to test the branch on postgres.
            Hide
            tjenness Tim Jenness added a comment -

            Andy Salnikov should I switch those huge strings in python/lsst/daf/butler/registry/attributes.py to TEXT as well?

            Show
            tjenness Tim Jenness added a comment - Andy Salnikov should I switch those huge strings in python/lsst/daf/butler/registry/attributes.py to TEXT as well?
            Hide
            salnikov Andy Salnikov added a comment -

            I think it's the right thing for values, for keys we need to make sure that they can be indexed, if TEXT can do that then it may be better to use TEXT there as well.

            Show
            salnikov Andy Salnikov added a comment - I think it's the right thing for values, for keys we need to make sure that they can be indexed, if TEXT can do that then it may be better to use TEXT there as well.
            Hide
            jbosch Jim Bosch added a comment -

            TEXT can be indexed in SQLite and PostgreSQL, and has no performance penalty at all in either (VARCHAR with limit and TEXT are the same under the hood in both). Might be different in other DBs, but right now I don't think we care.

            Show
            jbosch Jim Bosch added a comment - TEXT can be indexed in SQLite and PostgreSQL, and has no performance penalty at all in either (VARCHAR with limit and TEXT are the same under the hood in both). Might be different in other DBs, but right now I don't think we care.
            Hide
            tjenness Tim Jenness added a comment -

            So should I be changing all strings to TEXT? All strings > 128 characters?

            Show
            tjenness Tim Jenness added a comment - So should I be changing all strings to TEXT? All strings > 128 characters?
            Hide
            tjenness Tim Jenness added a comment -

            It seems that mysql does not allow a TEXT column to be used as a key in the index.

            https://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length

            Show
            tjenness Tim Jenness added a comment - It seems that mysql does not allow a TEXT column to be used as a key in the index. https://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length
            Hide
            jbosch Jim Bosch added a comment -

            I would lean towards all strings, but to keep the length field in ddl.FieldSpec in case we want it later, but I don't have strong opinions. Using VARCHAR in some places would let us check that our data (e.g. filter names) are consistent with the limits we declare.

            Definitely do this for regions, though now that we don't have oracle maybe we want to consider real blob types instead of strings again. We use strings because oracle blobs are slow.

            Show
            jbosch Jim Bosch added a comment - I would lean towards all strings, but to keep the length field in ddl.FieldSpec in case we want it later, but I don't have strong opinions. Using VARCHAR in some places would let us check that our data (e.g. filter names) are consistent with the limits we declare. Definitely do this for regions, though now that we don't have oracle maybe we want to consider real blob types instead of strings again. We use strings because oracle blobs are slow.
            Hide
            tjenness Tim Jenness added a comment -

            I'm not entirely sure what you are saying with "all strings". Are you saying that we should always specify a String type and length in the ddl but in the FieldSpec constructor we convert sqlalchemy.String to sqlalchemy.Text and ignore the length? (and we could for example decide that primary keys should stay String). If so, that works for me.

            Show
            tjenness Tim Jenness added a comment - I'm not entirely sure what you are saying with "all strings". Are you saying that we should always specify a String type and length in the ddl but in the FieldSpec constructor we convert sqlalchemy.String to sqlalchemy.Text and ignore the length? (and we could for example decide that primary keys should stay String). If so, that works for me.
            Hide
            jbosch Jim Bosch added a comment -

            I hadn't thought as far as how to implement "all strings are text", but what you just proposed sounds reasonable.

            Show
            jbosch Jim Bosch added a comment - I hadn't thought as far as how to implement "all strings are text", but what you just proposed sounds reasonable.
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch very small review. Switches over to TEXT but I've done it such that we can dynamically decide whether a String should be Text or not. I've left it so that short strings are String so that we can see that constraints are being applied properly.

            Show
            tjenness Tim Jenness added a comment - Jim Bosch very small review. Switches over to TEXT but I've done it such that we can dynamically decide whether a String should be Text or not. I've left it so that short strings are String so that we can see that constraints are being applied properly.
            Hide
            jbosch Jim Bosch added a comment -

            Looks good.  I like the solution for how to control this semi-globally.

            Show
            jbosch Jim Bosch added a comment - Looks good.  I like the solution for how to control this semi-globally.
            Hide
            tjenness Tim Jenness added a comment -

            This is a schema change so which number am I meant to be incrementing? Or are we going to try to say that we only really need to change the schema version once per week?

            Show
            tjenness Tim Jenness added a comment - This is a schema change so which number am I meant to be incrementing? Or are we going to try to say that we only really need to change the schema version once per week?
            Hide
            jbosch Jim Bosch added a comment -

            I thought there was a Registry-wide version number we could increment for this kind of change, but now I can't find it.  Maybe Andy Salnikov can.  If not, we should at least think about whether it add one or whether we should just increment the versions of all of the manager implementations in cases like this.

            Show
            jbosch Jim Bosch added a comment - I thought there was a Registry-wide version number we could increment for this kind of change, but now I can't find it.  Maybe Andy Salnikov can.  If not, we should at least think about whether it add one or whether we should just increment the versions of all of the manager implementations in cases like this.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            For the changes in the tables that are controlled by the managers the schema version for corresponding manager needs to be incremented. If there are tables that do not belong to any manager then we need to introduce separate version for those tables (or add a "virtual" manager for those tables). Having one global version in addition to per-manager version is complicated, I'd like to avoid that situation. This particular change is not tied to any specific manager, so it may not be trivial to decide which managers are affected, it may be easier to increment all of them? Or I can look at the schema and give you the list of managers that are affected.

            Show
            salnikov Andy Salnikov added a comment - - edited For the changes in the tables that are controlled by the managers the schema version for corresponding manager needs to be incremented. If there are tables that do not belong to any manager then we need to introduce separate version for those tables (or add a "virtual" manager for those tables). Having one global version in addition to per-manager version is complicated, I'd like to avoid that situation. This particular change is not tied to any specific manager, so it may not be trivial to decide which managers are affected, it may be easier to increment all of them? Or I can look at the schema and give you the list of managers that are affected.
            Hide
            tjenness Tim Jenness added a comment -

            At this point it's probably easier to increment them all and not worry about it too much.

            Show
            tjenness Tim Jenness added a comment - At this point it's probably easier to increment them all and not worry about it too much.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Jim Bosch, Simon Krughoff, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.