PR #8 updates resolve_schema_definition so that it doesn't issue duplicate names. That's a quick & easy workaround for the immediate issue here. However, there are a couple of things to unpack:
First, this doesn't actually resolve https://github.com/lsst-dm/alert_stream/issues/24 because “bad” schemas (including duplicate names) are serialized with that data. It'll need to be re-written to fix this issue, and I've not done that on this ticket.
Second, what I've done here is not actually a great solution. What I'm doing is modifying the names of the second (and later) occurrences of some records/enums/fixeds. That means the resulting “expanded” schema ends up with things like lsst.alert.cutout and lsst.alert.cutout1 in it, which is valid from a schema definition point of view, but isn't really compliant with the input schema (which doesn't define lsst.alert.cutout1 at all).
Unfortunately, I can't see a convenient way around this. The issue is that fastavro keeps a global cache of all the schema items it has seen defined. That means if you define lsst.alert.cutout and refer to it later, you can just look it up in that cache. That's fine... until you want to load more than one version of the schema at once. At that point, the cache will get confused about whether your lsst.alert.cutout refers to the version N schema or the version M schema, and nothing works properly.
Where does that leave us?
- I do not believe it is possible to fix resolve_schema_defintion to handle this “properly”; it just doesn't seem to be something that fastavro is designed for. (We might ask the fastavro author about this, as he seems to be pretty responsive.)
- Similarly, I'm not sure if the upstream fastavro issue actually helps — because I can't see how it can, without a more fundamental redesign — but I've not been able to fully understand the code yet. It might be worth cloning their branch and trying it.
- If we can show that the “standard” Avro library is not a performance bottleneck for us, then I'd be in favour of using it. Maria did have performance issues in the past, but it might be worth revisiting those. However, I think having both fastavro and standard-Avro in our codebase would be a pretty unfortunate outcome unless we show that's really essential.
- Of course, if we decided that it was never necessary to load multiple versions of the schema in the same process, a lot of these problems vanish...!

PR #8 updates resolve_schema_definition so that it doesn't issue duplicate names. That's a quick & easy workaround for the immediate issue here. However, there are a couple of things to unpack:
First, this doesn't actually resolve https://github.com/lsst-dm/alert_stream/issues/24 because “bad” schemas (including duplicate names) are serialized with that data. It'll need to be re-written to fix this issue, and I've not done that on this ticket.
Second, what I've done here is not actually a great solution. What I'm doing is modifying the names of the second (and later) occurrences of some records/enums/fixeds. That means the resulting “expanded” schema ends up with things like lsst.alert.cutout and lsst.alert.cutout1 in it, which is valid from a schema definition point of view, but isn't really compliant with the input schema (which doesn't define lsst.alert.cutout1 at all).
Unfortunately, I can't see a convenient way around this. The issue is that fastavro keeps a global cache of all the schema items it has seen defined. That means if you define lsst.alert.cutout and refer to it later, you can just look it up in that cache. That's fine... until you want to load more than one version of the schema at once. At that point, the cache will get confused about whether your lsst.alert.cutout refers to the version N schema or the version M schema, and nothing works properly.
Where does that leave us?