Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24403

Investigate Faust windowing feature for data aggregation

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In this ticket we implement an example to demonstrate the Faust windowing feature that can be used for aggregation of kafka streams.

      The general idea is that you can use a Faust agent to process the stream of events (messages) from a given Kafka topic and add those messages to a Faust table for persistence. The table is configured as a tumbling window. The window configuration parameters are a size and an expiration time. When the window expires a callback function is called to process the messages in that window. That's when the aggregation happens and a new Kafka topic is produced with the aggregated values.

      A docker compose config is provided to start a local Kafka cluster to run the example. The instructions to start the Kafka cluster, the Faust worker and the producer are described in the README. The docker compose also includes the Confluent Control Center which is handy to inspect the topics, their configurations and the consumer lag for the Faust consumers when running the application.

      NOTE: running multiple Faust workers to distribute the application load currently crashes the application, this will be investigated in DM-24473.

        Attachments

          Issue Links

            Activity

            Show
            afausti Angelo Fausti added a comment - See PR https://github.com/lsst-sqre/kafka-aggregator/pull/1
            Hide
            afausti Angelo Fausti added a comment -

            Hey Jonathan Sick I would appreciate very much your review to help to evaluate if we are on the right path in developing the kafka-aggregator based on Faust.

            Show
            afausti Angelo Fausti added a comment - Hey Jonathan Sick I would appreciate very much your review to help to evaluate if we are on the right path in developing the kafka-aggregator based on Faust.
            Hide
            jsick Jonathan Sick added a comment -

            This looks great. I've written up some of my thoughts in that PR and specifically this concluding note: https://github.com/lsst-sqre/kafka-aggregator/pull/1#pullrequestreview-395677689

            I think we should continue down this path. The real test will be metaprogramming the agents and their aggregation functions to based on schema data alone so that we're not explicitly coding up agents for each SAL topic.

            I'm also wondering about how we can align the windows based on the wallclock so that all aggregated streams have the same alignment so that we can do SQL joins conveniently.

            Show
            jsick Jonathan Sick added a comment - This looks great. I've written up some of my thoughts in that PR and specifically this concluding note: https://github.com/lsst-sqre/kafka-aggregator/pull/1#pullrequestreview-395677689 I think we should continue down this path. The real test will be metaprogramming the agents and their aggregation functions to based on schema data alone so that we're not explicitly coding up agents for each SAL topic. I'm also wondering about how we can align the windows based on the wallclock so that all aggregated streams have the same alignment so that we can do SQL joins conveniently.
            Hide
            afausti Angelo Fausti added a comment -

            Thank you Jonathan Sick for the review and for the insightful comments. I've implemented the suggestions in the PR and opened DM-24528 to investigate the window alignment further.

            Show
            afausti Angelo Fausti added a comment - Thank you Jonathan Sick for the review and for the insightful comments. I've implemented the suggestions in the PR and opened DM-24528 to investigate the window alignment further.

              People

              • Assignee:
                afausti Angelo Fausti
                Reporter:
                afausti Angelo Fausti
                Reviewers:
                Jonathan Sick
                Watchers:
                Angelo Fausti, Jonathan Sick
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel