I have attached the results of my research. My conclusion is that we should use Avro because the EFD data ingest code uses it, it has been supported for a long time, and it supports the scalar data types we use (except unsigned ints, which I think we can live without – certainly we have been doing so in the EFD).
The main difficulties we have to overcome are:
- Avro has no built-in support for array length constraints. I propose to append _N to our array-valued field names, where N is the fixed array length. It is a very simple solution and the best one I came up with. See my notes for some other options. If the EFD ingest code is changed to put a _ prefix before array index, we end up with field names like Test.arrays.boolen0_5_0 (a 5-element array, index 0), which I think is reasonable.
- There is no code generator for Python. But it's easy to make one and I think we would be better of doing so in any case. Our needs are modest because our XML is very simple (flat, no fancy datatypes, and no optional fields).
- We will also have to write new code generators for C++ and Java, or supplement the ones that are available, in order to handle array length constraints. Again, our XML is simple so I think this will be easy.
json schema is another reasonable choice. Unlike Avro it supports array length constraints. But unlike Avro and protobuf, its support for numeric types is very weak: it just has one kind of int and one kind of float. Between that limitation and the EFD ingest code, I am strongly in favor of using Avro, instead.
I have attached the results of my research. My conclusion is that we should use Avro because the EFD data ingest code uses it, it has been supported for a long time, and it supports the scalar data types we use (except unsigned ints, which I think we can live without – certainly we have been doing so in the EFD).
The main difficulties we have to overcome are:
json schema is another reasonable choice. Unlike Avro it supports array length constraints. But unlike Avro and protobuf, its support for numeric types is very weak: it just has one kind of int and one kind of float. Between that limitation and the EFD ingest code, I am strongly in favor of using Avro, instead.