Status: To Do
Fix Version/s: None
Some notes for future discussion (TDB) on handling schema changes in the T&S XML and the EFD.
Schema changes are inevitable and we want to make sure we evolve the schema in a way that it doesn't break the connectors we use in the EFD and the queries that we run against InfluxDB and Postgres.
We have a mechanism to check if a schema change is compatible or not, but that's currently disabled.
We propose to use FORWARD schema compatibility and enable the schema compatibility checks as part of T&S CI to detect incompatible changes early on.
T&S XML schemas are translated to Avro schemas by the SAL Kafka producer. Every time there's schema change (release of T&S XML), we update the SAL Kafka producer first, and a new version of the Avro schema for a given topic is uploaded to the Kafka Schema Registry.
In Kafka, when a schema is first created for a subject (topic), it gets a unique id, and it gets a version number. When the schema is updated (if it passes compatibility checks), it gets a new id, and it gets an incremented version number.
Currently, in the EFD, the Avro schema compatibility checks are disabled.
For the EFD we want to make sure that after changes in the T&S XML schema:
- The connectors continue working, i.e, writing to InfluxDB, Postgres, and Parquet files.
- We don't break any query that use to run against InfluxDB or Postgres.
So far, we have been using mostly InfluxDB. InfluxDB handles schema changes because fields are optional:
- If you delete a field in the schema, InfluxDB automatically fills in None values for all subsequent values so that an existing query won't break.
- If you add a field, InfluxDB automatically fills in None for the previous values, old queries don't use the new field, so that's ok, and new queries can use the new field.
However, we have noticed some incompatible changes in InfluxDB:
- If you change the datatype of a field from integer to float for example, this will break the InfluxDB connector (it won't be able to write on the existing schema)
For the Postgres connector, there is limited support to schema evolution. The connector can only add columns to a table. Still, it cannot remove columns, and I believe that is the same for the Parquet connector.
The "policy" we agreed verbally with T&S folks is never to change data types of existing fields, delete or rename existing fields but add a new field instead.
The above means we should be enforcing FORWARD schema compatibility.
For this to work, we should enable the schema compatibility checks as part of T&S CI to detect incompatible changes early on, i.e., before a T&S XML release.