Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17550

Update DMTN-093 to describe alert schema versioning

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Alert Production
    • Labels:
      None

      Description

      As developed in DM-17549.

        Attachments

          Issue Links

            Activity

            Hide
            jsick Jonathan Sick added a comment -

            I like this a lot! This is the path we're taking for the DM-EFD (https://sqr-029.lsst.io for the story so far) and for SQuaRE's microservice message bus. For the EFD and SQuaRE stuff we definitely needed the Schema Registry because there are so many schemas (>1000) and we do anticipate frequent schema migrations. For Alerts, event though it might be overkill it does seem better than inventing a new system.

            For users, adopting the Confluent Wire Format does add an extra wrinkle, but the good news is that there's pretty ubiquitous driver support across many languages (https://docs.confluent.io/current/clients/index.html). In Python, there's the confluent_python package or DM's own https://kafkit.lsst.io for asyncio apps. Consumers would need a way to identify the schema regardless (maybe put some bytes in the Kafka message's key?), so adoping the Confluent Wire Format makes sense.

            I like the proposal for a major-minor versioning pattern and to associate a Schema Registry subject with the major version. (We should do that for the DM-EFD too.) For convention, consider matching the root of the subject name with the fully-qualified name (namespace + name fields) of the schema. We're doing that for the DM-EFD. Another thing you might want to talk about is having staging subjects for new schemas. This could be a naming convention on the subject names. Like schemaname-N-dev.  That way you can do end-to-end integration testing of new schemas without committing anything to the "production" subjects.

            One thing you might want to plan for is creating a proxy server the Schema Registry. The proxy would would be publicly accessible (have its own public DNS and ingress) and would match the Schema Registry HTTP API except for maybe two differences. The proxy would be read-only, or alternatively, it would integrate with LSST Auth to allow for administrative access (like adding new schemas). The proxy could also add its own schema caching layer and rate limiting behavior to help prevent a DDoS attack given the public exposure. Confluent also makes a REST Proxy product, but I think it'd be easier to implement a custom proxy to get the LSST Auth integration. The Schema Registry doesn't have a terribly complex API.

            Lastly, a note about schema compatibility from a user's perspective. For most Confluent Wire Format-aware clients, the default behavior is just deserialize the message with the schema associated with the message. To get that extra behavior of dropping new fields, and adding defaults to deleted optional fields, the consumer needs to deserialize with the new schema and then project that data onto the schema the consumer application is built for. That is, you'd use an API like https://fastavro.readthedocs.io/en/latest/reader.html#fastavro._read_py.schemaless_reader:

            reader = schemaless_reader(fh, writer_schema, reader_schema=preferred_schema)
            

            writer_schema is the schema identified in the message. preferred_schema is the schema the application is expecting.

            So taking advantage of the schema migration capability would require some documentation for our users. When a user's application is deployed they'd have to know the ID of the schema they built the app for (their preferred schema). I'm planning on builting this behavior into Kafkit's deserializer, but I haven't seen it elsewhere in the Confluent-based libraries.

            Show
            jsick Jonathan Sick added a comment - I like this a lot! This is the path we're taking for the DM-EFD ( https://sqr-029.lsst.io  for the story so far) and for SQuaRE's microservice message bus. For the EFD and SQuaRE stuff we definitely needed the Schema Registry because there are so many schemas (>1000) and we do anticipate frequent schema migrations. For Alerts, event though it might be overkill it does seem better than inventing a new system. For users, adopting the Confluent Wire Format does add an extra wrinkle, but the good news is that there's pretty ubiquitous driver support across many languages ( https://docs.confluent.io/current/clients/index.html).  In Python, there's the confluent_python package or DM's own https://kafkit.lsst.io  for asyncio apps. Consumers would need a way to identify the schema regardless (maybe put some bytes in the Kafka message's key?), so adoping the Confluent Wire Format makes sense. I like the proposal for a major-minor versioning pattern and to associate a Schema Registry subject with the major version. (We should do that for the DM-EFD too.) For convention, consider matching the root of the subject name with the fully-qualified name ( namespace + name fields) of the schema. We're doing that for the DM-EFD. Another thing you might want to talk about is having staging subjects for new schemas. This could be a naming convention on the subject names. Like schemaname-N-dev .  That way you can do end-to-end integration testing of new schemas without committing anything to the "production" subjects. One thing you might want to plan for is creating a proxy server the Schema Registry. The proxy would would be publicly accessible (have its own public DNS and ingress) and would match the Schema Registry HTTP API except for maybe two differences. The proxy would be read-only, or alternatively, it would integrate with LSST Auth to allow for administrative access (like adding new schemas). The proxy could also add its own schema caching layer and rate limiting behavior to help prevent a DDoS attack given the public exposure. Confluent also makes a REST Proxy product, but I think it'd be easier to implement a custom proxy to get the LSST Auth integration. The Schema Registry doesn't have a terribly complex API. Lastly, a note about schema compatibility from a user's perspective. For most Confluent Wire Format-aware clients, the default behavior is just deserialize the message with the schema associated with the message. To get that extra behavior of dropping new fields, and adding defaults to deleted optional fields, the consumer needs to deserialize with the new schema and then project that data onto the schema the consumer application is built for. That is, you'd use an API like  https://fastavro.readthedocs.io/en/latest/reader.html#fastavro._read_py.schemaless_reader : reader = schemaless_reader(fh, writer_schema, reader_schema = preferred_schema) writer_schema is the schema identified in the message. preferred_schema is the schema the application is expecting. So taking advantage of the schema migration capability would require some documentation for our users. When a user's application is deployed they'd have to know the ID of the schema they built the app for (their preferred schema). I'm planning on builting this behavior into Kafkit's deserializer, but I haven't seen it elsewhere in the Confluent-based libraries.
            Hide
            swinbank John Swinbank added a comment -

            Jonathan Sick — thank you very much for the useful comments! I'm reassured that you don't see any issues with the basic technological choices, and you provide plenty of food for thought for future development. Much appreciated!

            Show
            swinbank John Swinbank added a comment - Jonathan Sick — thank you very much for the useful comments! I'm reassured that you don't see any issues with the basic technological choices, and you provide plenty of food for thought for future development. Much appreciated!
            Hide
            swinbank John Swinbank added a comment -

            Eric Bellm — I've pushed some changes to https://dmtn-093.lsst.io/v/DM-17550/index.html, primarily softening the wording a bit in the hope of addressing your concerns. To what extent have I succeeded?

            Per comments on GitHub, we should probably chat face-to-face about the scope of this document and where we can most effectively record plans and designs without them being regarded as normative.

            Show
            swinbank John Swinbank added a comment - Eric Bellm — I've pushed some changes to https://dmtn-093.lsst.io/v/DM-17550/index.html , primarily softening the wording a bit in the hope of addressing your concerns. To what extent have I succeeded? Per comments on GitHub, we should probably chat face-to-face about the scope of this document and where we can most effectively record plans and designs without them being regarded as normative.
            Hide
            ebellm Eric Bellm added a comment -

            Thanks John Swinbank, I'm happy with these tweaks, and agreed that we should think about where to put interface descriptions for external users of the alert stream.

            Show
            ebellm Eric Bellm added a comment - Thanks John Swinbank , I'm happy with these tweaks, and agreed that we should think about where to put interface descriptions for external users of the alert stream.
            Hide
            swinbank John Swinbank added a comment -

            Thanks both; merged & done.

            Show
            swinbank John Swinbank added a comment - Thanks both; merged & done.

              People

              Assignee:
              swinbank John Swinbank
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Eric Bellm
              Watchers:
              Eric Bellm, John Swinbank, Jonathan Sick
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: