Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24645

Provide a sample consumer application for the alert stream simulator

    Details

    • Story Points:
      1
    • Epic Link:
    • Sprint:
      AP S20-6 (May), AP F20-1 (June)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Provide a consumer application which simply counts the number of alerts it has received and prints the count to stdout every second. This is intended to be scaffolding code for users to work with.

        Attachments

          Issue Links

            Activity

            swnelson Spencer Nelson created issue -
            swnelson Spencer Nelson made changes -
            Field Original Value New Value
            Epic Link DM-24345 [ 433032 ]
            swnelson Spencer Nelson made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            swnelson Spencer Nelson made changes -
            Reviewers Spencer Nelson [ swnelson ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            swnelson Spencer Nelson made changes -
            Status In Review [ 10004 ] In Progress [ 3 ]
            swnelson Spencer Nelson made changes -
            Reviewers Spencer Nelson [ swnelson ]
            swnelson Spencer Nelson made changes -
            Reviewers Eric Bellm [ ebellm ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            ebellm Eric Bellm made changes -
            Watchers Eric Bellm, Spencer Nelson [ Eric Bellm, Spencer Nelson ] Eric Bellm, John Swinbank, Spencer Nelson [ Eric Bellm, John Swinbank, Spencer Nelson ]
            Hide
            ebellm Eric Bellm added a comment -

            Hi Spencer Nelson,

            Since the sample client application is intended to give users code to build from, I'm concerned that we've got a lot of code in streamsim that would might reside more naturally elsewhere. In particular it seems like basically all of serialization.py could live in lsst/alert_packet, and similarly the _KafkaClient abstraction could be in a separate package as well.

            By doing that users could build client applications against a more permanent Rubin interface (acknowledging that we're a looong ways from having that be stable). Leaning harder on alert_packet would also let us avoid hard-coding schema versions as is done currently.

            Might be worth talking about this live so I can understand better where you're coming from. I would welcome John Swinbank's thoughts here also.

            Show
            ebellm Eric Bellm added a comment - Hi Spencer Nelson , Since the sample client application is intended to give users code to build from, I'm concerned that we've got a lot of code in streamsim that would might reside more naturally elsewhere. In particular it seems like basically all of serialization.py could live in lsst/alert_packet , and similarly the _KafkaClient abstraction could be in a separate package as well. By doing that users could build client applications against a more permanent Rubin interface (acknowledging that we're a looong ways from having that be stable). Leaning harder on alert_packet would also let us avoid hard-coding schema versions as is done currently. Might be worth talking about this live so I can understand better where you're coming from. I would welcome John Swinbank 's thoughts here also.
            Hide
            swnelson Spencer Nelson added a comment -

            Eric Bellm, I pretty much agree, but we decided to do it this way so that we don't have to think about providing lsst/alert_packet as an easy-to-install Python package. It doesn't have a setup.py or anything, so it's not clear to me how we'd throw it on PyPI for redistribution outside of the stack.

            If you think we can distribute parts of the stack as PyPI/Conda packages publicly, I'm all for moving the code over. It wouldn't be terribly hard with lsst/alert_packet, I think, since it doesn't have any dependencies on other LSST stack packages.

            Show
            swnelson Spencer Nelson added a comment - Eric Bellm , I pretty much agree, but we decided to do it this way so that we don't have to think about providing lsst/alert_packet as an easy-to-install Python package. It doesn't have a setup.py or anything, so it's not clear to me how we'd throw it on PyPI for redistribution outside of the stack. If you think we can distribute parts of the stack as PyPI/Conda packages publicly, I'm all for moving the code over. It wouldn't be terribly hard with lsst/alert_packet , I think, since it doesn't have any dependencies on other LSST stack packages.
            Hide
            swinbank John Swinbank added a comment - - edited

            Hi both —

            We discussed this at the DM-CCB today.

            At the moment, we do not have a policy regarding distributing Rubin-written code through PyPI (/Conda/etc). In principle, the DM-CCB is supportive of the notion: I think it is broadly fair to characterize their attitude of anything which can go on PyPI (because it doesn't have extensive dependencies on the rest of our stack) should. However, they also worried about the logistics of making this happen: are individual developers empowered to simply make releases of code at their whim, rather than going through the (heavyweight) Pipelines release process? Do packages simply get posted using individual developer accounts? If so, what happens when that developer is unavailable for some reason?

            For now, we suggest:

            • If you both think it would be useful to have alert_packet on PyPI, I'm inclined to agree with you. We can proceed following the template established by astro_metadata_translator . In particular, this includes both setup.py and SConstruct files, and is managed as a “third party” package as far as our release process is concerned (ie, it's owned by the “DM Externals” team on GitHub). Tim Jenness set this up and makes releases to PyPI — I'm sure he'd be willing to advise on what hoops to jump through.
            • The DM-CCB will start drafting some policy text to make clearer what's allowed and how it ought to be managed in future. It may be that we'll need to tweak whatever we come up with now to comply with that at some point, but there's no requirement to stop and wait for policy to be available before making forward progress.

            Does that help?

            Show
            swinbank John Swinbank added a comment - - edited Hi both — We discussed this at the DM-CCB today. At the moment, we do not have a policy regarding distributing Rubin-written code through PyPI (/Conda/etc). In principle, the DM-CCB is supportive of the notion: I think it is broadly fair to characterize their attitude of anything which can go on PyPI (because it doesn't have extensive dependencies on the rest of our stack) should. However, they also worried about the logistics of making this happen: are individual developers empowered to simply make releases of code at their whim, rather than going through the (heavyweight) Pipelines release process? Do packages simply get posted using individual developer accounts? If so, what happens when that developer is unavailable for some reason? For now, we suggest: If you both think it would be useful to have alert_packet on PyPI, I'm inclined to agree with you. We can proceed following the template established by astro_metadata_translator . In particular, this includes both setup.py and SConstruct files, and is managed as a “third party” package as far as our release process is concerned (ie, it's owned by the “DM Externals” team on GitHub). Tim Jenness set this up and makes releases to PyPI — I'm sure he'd be willing to advise on what hoops to jump through. The DM-CCB will start drafting some policy text to make clearer what's allowed and how it ought to be managed in future. It may be that we'll need to tweak whatever we come up with now to comply with that at some point, but there's no requirement to stop and wait for policy to be available before making forward progress. Does that help?
            Hide
            swnelson Spencer Nelson added a comment - - edited

            That really helps, yes. I like that plan a lot. I'll make a ticket to distribute lsst/alert_stream on PyPI and Conda, and then to merge alert-stream-simulator's streamsim into alert_stream. I think we should delay moving the Kafka client in until we sort out whether to use confluent_kafka or aiokafka (DM-24679).

            Show
            swnelson Spencer Nelson added a comment - - edited That really helps, yes. I like that plan a lot. I'll make a ticket to distribute lsst/alert_stream on PyPI and Conda, and then to merge alert-stream-simulator's streamsim into alert_stream. I think we should delay moving the Kafka client in until we sort out whether to use confluent_kafka or aiokafka ( DM-24679 ).
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25104 [ DM-25104 ]
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25102 [ DM-25102 ]
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25103 [ DM-25103 ]
            Hide
            tjenness Tim Jenness added a comment -

            Note that 20.0 release is imminent so if you don't want a 20.0 tag turning up on the repo (which would seemingly force you to start at v20.0 on pypi or else be very confused) you should do the team tweak ASAP.

            Show
            tjenness Tim Jenness added a comment - Note that 20.0 release is imminent so if you don't want a 20.0 tag turning up on the repo (which would seemingly force you to start at v20.0 on pypi or else be very confused) you should do the team tweak ASAP.
            Hide
            swnelson Spencer Nelson added a comment -

            Tim Jenness what's the "team tweak?"

            Show
            swnelson Spencer Nelson added a comment - Tim Jenness what's the "team tweak?"
            Hide
            tjenness Tim Jenness added a comment -

            See above comment:

            it's owned by the “DM Externals” team on GitHub

            It may be that John or I need to make that change for you.

            Show
            tjenness Tim Jenness added a comment - See above comment: it's owned by the “DM Externals” team on GitHub It may be that John or I need to make that change for you.
            Hide
            ebellm Eric Bellm added a comment -

            All of the above sounds good to me. Spencer Nelson, I would think that uncertainty about what kafka library to build on would push us towards getting the client interface behind an abstraction layer sooner--I guess that depends if we think we can get to DM-24679 before circulating the simulator more broadly.

            Show
            ebellm Eric Bellm added a comment - All of the above sounds good to me. Spencer Nelson , I would think that uncertainty about what kafka library to build on would push us towards getting the client interface behind an abstraction layer sooner--I guess that depends if we think we can get to DM-24679 before circulating the simulator more broadly.
            Hide
            swinbank John Swinbank added a comment -

            The way we use teams on GitHub is (too briefly) described at https://developer.lsst.io/stack/adding-a-new-package.html#adding-a-new-package-to-the-build.

            I'm not completely sure I understand what permissions are required here (see https://lsstc.slack.com/archives/C2JP8GGVC/p1590617175372300), but I have given the “DM Externals” team ”write” access (to match astro_metadata_translator).

            (I note that a_m_t also has the “DMLT” team with “admin” access, but that isn't specified in the Developer Guide so I didn't set that.)

            Show
            swinbank John Swinbank added a comment - The way we use teams on GitHub is (too briefly) described at https://developer.lsst.io/stack/adding-a-new-package.html#adding-a-new-package-to-the-build . I'm not completely sure I understand what permissions are required here (see https://lsstc.slack.com/archives/C2JP8GGVC/p1590617175372300 ), but I have given the “DM Externals” team ”write” access (to match astro_metadata_translator). (I note that a_m_t also has the “DMLT” team with “admin” access, but that isn't specified in the Developer Guide so I didn't set that.)
            swinbank John Swinbank made changes -
            Team Alert Production [ 10300 ]
            swinbank John Swinbank made changes -
            Link This issue blocks DM-25226 [ DM-25226 ]
            swinbank John Swinbank made changes -
            Epic Link DM-24345 [ 433032 ] DM-25143 [ 435261 ]
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25103 [ DM-25103 ]
            swnelson Spencer Nelson made changes -
            Link This issue is blocked by DM-25199 [ DM-25199 ]
            swnelson Spencer Nelson made changes -
            Link This issue is blocked by DM-25104 [ DM-25104 ]
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25104 [ DM-25104 ]
            swnelson Spencer Nelson made changes -
            Link This issue relates to DM-25102 [ DM-25102 ]
            swinbank John Swinbank made changes -
            Sprint AP S20-6 (May) [ 987 ] AP S20-6 (May), AP F20-1 (June) [ 987, 1019 ]
            Hide
            swnelson Spencer Nelson added a comment -

            Okay, Avro serialization and the schema are now pulled out of alert-stream-simulator and its streamsim package. It now references those things from lsst.alert.packet, which is marked as a dependency, and which it can get from PyPI.

            The streamsim package still has more in it that we could take out, but this other stuff is less obviously redundant. Specifically:

            • streamsim.serialization is now mostly to do with serializing timestamps (in a way that really only matters internally to alert-stream-simulator) and with serializing to Confluent Wire Format (which only matters when you write to Kafka). The timestamp stuff should definitely stay in alert-stream-simulator; the wire format stuff could conceivably belong in some new lsst.alert.stream.kafka package.
            • streamsim._kafka is a large pile of stuff for interacting with Kafka sanely. That could all go elsewhere, although the API was written with the simulator in mind.

            Personally, I think code reuse can be overrated, and we shouldn't fret too much about throwing this stuff out and doing it over when we do the actual Kafka client used in the production pipeline. That thing will have different needs anyway. So, I lean towards calling this good.

            But I want to raise the question - should we pull more out of alert-stream-simulator and its streamsim package, or are we good?

            Show
            swnelson Spencer Nelson added a comment - Okay, Avro serialization and the schema are now pulled out of alert-stream-simulator and its streamsim package. It now references those things from lsst.alert.packet , which is marked as a dependency, and which it can get from PyPI. The streamsim package still has more in it that we could take out, but this other stuff is less obviously redundant. Specifically: streamsim.serialization is now mostly to do with serializing timestamps (in a way that really only matters internally to alert-stream-simulator) and with serializing to Confluent Wire Format (which only matters when you write to Kafka). The timestamp stuff should definitely stay in alert-stream-simulator; the wire format stuff could conceivably belong in some new lsst.alert.stream.kafka package. streamsim._kafka is a large pile of stuff for interacting with Kafka sanely. That could all go elsewhere, although the API was written with the simulator in mind. Personally, I think code reuse can be overrated, and we shouldn't fret too much about throwing this stuff out and doing it over when we do the actual Kafka client used in the production pipeline. That thing will have different needs anyway. So, I lean towards calling this good. But I want to raise the question - should we pull more out of alert-stream-simulator and its streamsim package, or are we good?
            Hide
            ebellm Eric Bellm added a comment -

            Agreed that the timestamps code clearly stays in the simulator.

            The Confluent Wire Format and _kafka code are both pieces that brokers are going to need in order to take the stream output and couple it to their systems: either they will use our code or will need to create equivalents themselves. As I said above if we package them up someplace reusable now that (I think) makes it easier to manage the interface going forward. But if your instincts from e.g. SciMMA are different I'm happy to hear the counterargument.

            Show
            ebellm Eric Bellm added a comment - Agreed that the timestamps code clearly stays in the simulator. The Confluent Wire Format and _kafka code are both pieces that brokers are going to need in order to take the stream output and couple it to their systems: either they will use our code or will need to create equivalents themselves. As I said above if we package them up someplace reusable now that (I think) makes it easier to manage the interface going forward. But if your instincts from e.g. SciMMA are different I'm happy to hear the counterargument.
            Hide
            swnelson Spencer Nelson added a comment -

            Aha, right, you expect the code to be useful to brokers, not necessarily to ap_association or whatever. That's a good argument. I'll make a new distributable package for them.

            Show
            swnelson Spencer Nelson added a comment - Aha, right, you expect the code to be useful to brokers , not necessarily to ap_association or whatever. That's a good argument. I'll make a new distributable package for them.
            Hide
            ebellm Eric Bellm added a comment -

            Spencer Nelson and I discussed this by video as some additional complications had arisen. Much of the _kafka infrastructure currently in the simulator is code that is more relevant for the DM side (topic creation and modification, etc.) rather than the client library, and we expect that that code should firmly live in Stack land--which makes it challenging to distribute to third parties. On the other hand we didn't feel especially ready to stand up a full-on Kafka client library.

            We decided to take a bit of an usual path forward: to distribute two example client libraries. The first is the existing one, using the _kafka code currently in this package. The second is the third-party adc_streaming code (https://github.com/astronomy-commons/adc-streaming). We thought this would provide broker teams a few options for connecting with the simulator while emphasizing that these interfaces are not final.

            Show
            ebellm Eric Bellm added a comment - Spencer Nelson and I discussed this by video as some additional complications had arisen. Much of the _kafka infrastructure currently in the simulator is code that is more relevant for the DM side (topic creation and modification, etc.) rather than the client library, and we expect that that code should firmly live in Stack land--which makes it challenging to distribute to third parties. On the other hand we didn't feel especially ready to stand up a full-on Kafka client library. We decided to take a bit of an usual path forward: to distribute two example client libraries. The first is the existing one, using the _kafka code currently in this package. The second is the third-party adc_streaming code ( https://github.com/astronomy-commons/adc-streaming ). We thought this would provide broker teams a few options for connecting with the simulator while emphasizing that these interfaces are not final.
            swnelson Spencer Nelson made changes -
            Link This issue is blocked by DM-25390 [ DM-25390 ]
            Hide
            ebellm Eric Bellm added a comment -

            Looks good!

            Show
            ebellm Eric Bellm added a comment - Looks good!
            ebellm Eric Bellm made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            swnelson Spencer Nelson made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              • Assignee:
                swnelson Spencer Nelson
                Reporter:
                swnelson Spencer Nelson
                Reviewers:
                Eric Bellm
                Watchers:
                Eric Bellm, John Swinbank, Spencer Nelson, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel