Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24645

Provide a sample consumer application for the alert stream simulator

    Details

    • Story Points:
      1
    • Epic Link:
    • Sprint:
      AP S20-6 (May), AP F20-1 (June)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Provide a consumer application which simply counts the number of alerts it has received and prints the count to stdout every second. This is intended to be scaffolding code for users to work with.

        Attachments

          Issue Links

            Activity

            Hide
            ebellm Eric Bellm added a comment -

            Looks good!

            Show
            ebellm Eric Bellm added a comment - Looks good!
            Hide
            ebellm Eric Bellm added a comment -

            Spencer Nelson and I discussed this by video as some additional complications had arisen. Much of the _kafka infrastructure currently in the simulator is code that is more relevant for the DM side (topic creation and modification, etc.) rather than the client library, and we expect that that code should firmly live in Stack land--which makes it challenging to distribute to third parties. On the other hand we didn't feel especially ready to stand up a full-on Kafka client library.

            We decided to take a bit of an usual path forward: to distribute two example client libraries. The first is the existing one, using the _kafka code currently in this package. The second is the third-party adc_streaming code (https://github.com/astronomy-commons/adc-streaming). We thought this would provide broker teams a few options for connecting with the simulator while emphasizing that these interfaces are not final.

            Show
            ebellm Eric Bellm added a comment - Spencer Nelson and I discussed this by video as some additional complications had arisen. Much of the _kafka infrastructure currently in the simulator is code that is more relevant for the DM side (topic creation and modification, etc.) rather than the client library, and we expect that that code should firmly live in Stack land--which makes it challenging to distribute to third parties. On the other hand we didn't feel especially ready to stand up a full-on Kafka client library. We decided to take a bit of an usual path forward: to distribute two example client libraries. The first is the existing one, using the _kafka code currently in this package. The second is the third-party adc_streaming code ( https://github.com/astronomy-commons/adc-streaming ). We thought this would provide broker teams a few options for connecting with the simulator while emphasizing that these interfaces are not final.
            Hide
            swnelson Spencer Nelson added a comment -

            Aha, right, you expect the code to be useful to brokers, not necessarily to ap_association or whatever. That's a good argument. I'll make a new distributable package for them.

            Show
            swnelson Spencer Nelson added a comment - Aha, right, you expect the code to be useful to brokers , not necessarily to ap_association or whatever. That's a good argument. I'll make a new distributable package for them.
            Hide
            ebellm Eric Bellm added a comment -

            Agreed that the timestamps code clearly stays in the simulator.

            The Confluent Wire Format and _kafka code are both pieces that brokers are going to need in order to take the stream output and couple it to their systems: either they will use our code or will need to create equivalents themselves. As I said above if we package them up someplace reusable now that (I think) makes it easier to manage the interface going forward. But if your instincts from e.g. SciMMA are different I'm happy to hear the counterargument.

            Show
            ebellm Eric Bellm added a comment - Agreed that the timestamps code clearly stays in the simulator. The Confluent Wire Format and _kafka code are both pieces that brokers are going to need in order to take the stream output and couple it to their systems: either they will use our code or will need to create equivalents themselves. As I said above if we package them up someplace reusable now that (I think) makes it easier to manage the interface going forward. But if your instincts from e.g. SciMMA are different I'm happy to hear the counterargument.
            Hide
            swnelson Spencer Nelson added a comment -

            Okay, Avro serialization and the schema are now pulled out of alert-stream-simulator and its streamsim package. It now references those things from lsst.alert.packet, which is marked as a dependency, and which it can get from PyPI.

            The streamsim package still has more in it that we could take out, but this other stuff is less obviously redundant. Specifically:

            • streamsim.serialization is now mostly to do with serializing timestamps (in a way that really only matters internally to alert-stream-simulator) and with serializing to Confluent Wire Format (which only matters when you write to Kafka). The timestamp stuff should definitely stay in alert-stream-simulator; the wire format stuff could conceivably belong in some new lsst.alert.stream.kafka package.
            • streamsim._kafka is a large pile of stuff for interacting with Kafka sanely. That could all go elsewhere, although the API was written with the simulator in mind.

            Personally, I think code reuse can be overrated, and we shouldn't fret too much about throwing this stuff out and doing it over when we do the actual Kafka client used in the production pipeline. That thing will have different needs anyway. So, I lean towards calling this good.

            But I want to raise the question - should we pull more out of alert-stream-simulator and its streamsim package, or are we good?

            Show
            swnelson Spencer Nelson added a comment - Okay, Avro serialization and the schema are now pulled out of alert-stream-simulator and its streamsim package. It now references those things from lsst.alert.packet , which is marked as a dependency, and which it can get from PyPI. The streamsim package still has more in it that we could take out, but this other stuff is less obviously redundant. Specifically: streamsim.serialization is now mostly to do with serializing timestamps (in a way that really only matters internally to alert-stream-simulator) and with serializing to Confluent Wire Format (which only matters when you write to Kafka). The timestamp stuff should definitely stay in alert-stream-simulator; the wire format stuff could conceivably belong in some new lsst.alert.stream.kafka package. streamsim._kafka is a large pile of stuff for interacting with Kafka sanely. That could all go elsewhere, although the API was written with the simulator in mind. Personally, I think code reuse can be overrated, and we shouldn't fret too much about throwing this stuff out and doing it over when we do the actual Kafka client used in the production pipeline. That thing will have different needs anyway. So, I lean towards calling this good. But I want to raise the question - should we pull more out of alert-stream-simulator and its streamsim package, or are we good?

              People

              • Assignee:
                swnelson Spencer Nelson
                Reporter:
                swnelson Spencer Nelson
                Reviewers:
                Eric Bellm
                Watchers:
                Eric Bellm, John Swinbank, Spencer Nelson, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel