Fix Version/s: None
Simon Krughoff reported that he was ending up with filenames that were too long for the column definition in the datastore records table.
Currently the file path is 256 characters but we have lots of directories in the default template and the collection appears twice. A collection can be 64 characters so half the path can be taken up solely by the collection.
Consider using TEXT for this field rather than going to VARCHAR(512).
Andy Salnikov should I switch those huge strings in python/lsst/daf/butler/registry/attributes.py to TEXT as well?
I think it's the right thing for values, for keys we need to make sure that they can be indexed, if TEXT can do that then it may be better to use TEXT there as well.
TEXT can be indexed in SQLite and PostgreSQL, and has no performance penalty at all in either (VARCHAR with limit and TEXT are the same under the hood in both). Might be different in other DBs, but right now I don't think we care.
So should I be changing all strings to TEXT? All strings > 128 characters?
It seems that mysql does not allow a TEXT column to be used as a key in the index.
I would lean towards all strings, but to keep the length field in ddl.FieldSpec in case we want it later, but I don't have strong opinions. Using VARCHAR in some places would let us check that our data (e.g. filter names) are consistent with the limits we declare.
Definitely do this for regions, though now that we don't have oracle maybe we want to consider real blob types instead of strings again. We use strings because oracle blobs are slow.
I'm not entirely sure what you are saying with "all strings". Are you saying that we should always specify a String type and length in the ddl but in the FieldSpec constructor we convert sqlalchemy.String to sqlalchemy.Text and ignore the length? (and we could for example decide that primary keys should stay String). If so, that works for me.
I hadn't thought as far as how to implement "all strings are text", but what you just proposed sounds reasonable.
Jim Bosch very small review. Switches over to TEXT but I've done it such that we can dynamically decide whether a String should be Text or not. I've left it so that short strings are String so that we can see that constraints are being applied properly.
This is a schema change so which number am I meant to be incrementing? Or are we going to try to say that we only really need to change the schema version once per week?
I thought there was a Registry-wide version number we could increment for this kind of change, but now I can't find it. Maybe Andy Salnikov can. If not, we should at least think about whether it add one or whether we should just increment the versions of all of the manager implementations in cases like this.
For the changes in the tables that are controlled by the managers the schema version for corresponding manager needs to be incremented. If there are tables that do not belong to any manager then we need to introduce separate version for those tables (or add a "virtual" manager for those tables). Having one global version in addition to per-manager version is complicated, I'd like to avoid that situation. This particular change is not tied to any specific manager, so it may not be trivial to decide which managers are affected, it may be easier to increment all of them? Or I can look at the schema and give you the list of managers that are affected.
At this point it's probably easier to increment them all and not worry about it too much.
TEXT seems to do the right thing on sqlite. Not sure how to test the branch on postgres.