I just finished my first read-through of the tech note. Here's some initial thoughts - I reserve the right to have more thoughts over the next day or two
Overall, I think it's quite good. Eric asked me to take a look to see how things have evolved, and I can't say that I see much to criticize here, or much that has changed.
I think that the commentary on the alert filtering service was especially interesting. That's a new advantage that I hadn't been aware of.
I think the document should describe what plausible rate limits might be. In particular: 270 full alerts/sec would permit a user to retrieve 10,000 full alerts per 37 seconds - just enough to keep up with the stream. however, if full alerts are 100kB of data, then this would be a hefty bandwidth budget. So there are still tradeoffs to be made in finding that rate limit.
Another way to think about rate limits is that, if we have a 10Gbps bandwidth budget, and full alert payloads are 100kB, then we max out at serving 12,500 full alert payloads per second. Our task is to allocate that 12,500/s. Anything under 270/s per user means they can't keep up, though. We can only fit about 50 users if we want to give them 270/s.
Okay, separate topic: I feel somewhat daunted by thinking about the identities that actually get rate limited. We could (for example) limit by source IP address, or by API keys which we hand out, or by some other identity. Each of those varies in complexity by quite a bit: source IP rate limits are dead simple, but are easy to evade and don't set bounds on the number of unique IPs that request data. API keys would require an entire service for management (generation, revocation, etc), but should be relatively easy to rate limit on (for example, users might include them as an HTTP header; most HTTP proxies permit rate limits based on HTTP headers.)
The "implementation" section seems totally fine as is; I'm not sure I'd add much to it, but I could add some discussion of rate limits.