Skip to main content
hhow09's Blog

System Design Interview Vol.2 - Chapter 6. Ad Click Event Aggregation

the content mainly comes from System Design Interview Vol.2 - Chapter 6. Ad Click Event Aggregation

design a ad click event aggregation system for near-realtime data.

TLDR #

  1. data pipeline / streaming: message queue
  2. aggregation service: map reduce
  3. database: read-heavy, write-heavy, no update or transaction reqiured
  4. issues:
    • duplicate events
    • exactly once processing
    • fault tolerance
    • hotspot
  5. scaling
    • horizontal scaling in message queue, database and aggregaiton service

Requirement #

Functional Requirement #

  1. Support querying aggregated data: the number of clicks of certian ad (ad_id) in last M minutes
  2. Support querying aggregated data: top N most clicked ad in last M minutes.
  3. Support filtering by different attributes in above 2 querys.
  4. Dataset volume is FAANG scale
    • volume: single database is not a choice.

Non-Functional Requirement #

  1. correctness of the aggregation result is important
    • as data is used for RTB and ads billing
  2. Properly handle delayed duplicate events
  3. The system should be resilient to partial failures.
  4. End-to-end latency should be a few minutes, at most.
    • need realtime streaming system (instead of batch system)

Estimation #

High Level Design #

Data Model #

Raw Event #

ad_id timestamp user_id ip country_code
ad001 2023-06-01 00:00:01 user1 207.148.22.22 US
ad001 2023-06-01 00:00:02 user2 209.153.55.11 JP
ad002 2023-06-01 00:00:02 user2 209.153.55.11 JP

Aggregated Data #

ad_id timestamp_minute count filter_id
ad001 202306010000 5 0023
ad001 202306010000 7 0012
ad002 202306010001 7 0012
ad002 202306010001 8 0023

data under same ad_id-timestamp_minute can be further group by filter_id

Query API #

the number of clicks of certian ad (ad_id) in last M minutes #

top N most clicked ad in last M minutes #

Data Storage #

we need to store both raw data and aggregated data

Comparison between querying raw data vs aggregated data #

query raw data query aggregated data
usage by other service this system
pros full data set, support ad-hoc filter smaller data set, fast query
cons huge storage, slow query data loss
where cold storage database

Database Choice #

analyze
Choices

High Level Design #

Figure 3: High Level Design

Kafka support high throughput, exact-once delivery.we can discuss exact-once delivery / atomic commit in Delivery Guarantees

Aggregation Service #

the number of clicks of certian ad (ad_id) in last M minutes #

Figure 8 Aggregate the number of clicks

Deep Dive #

Aggregation Window and Watermark #

What Is Watermarking? #

when working with real-time streaming data there will be delays between event time and processing time due to how data is ingested and whether the overall application experiences issues like downtime. Due to these potential variable delays, the engine that you use to process this data needs to have some mechanism to decide when to close the aggregate windows and produce the aggregate result. [5]

Delivery Guarantees #

In most circumstances, at-least once processing is good enough if a small percentage of duplicates are acceptable. However, this is not the case for our system. Differences of a few percent in data points could result in discrepancies of millions of dollars.

Therefore, we recommend exactly-once delivery for the system.

Data deduplication #

two common sources:

  1. client resend
  2. aggregation server outage

Figure 17 Duplicate data

If step 6 fails, perhaps due to Aggregator outage, events from 100 to 110 are already sent to the downstream, but the new offset 110 is not persisted in upstream Kafka. In this case, a new Aggregator would consume again from offset 100, even if those events are already processed, causing duplicate data.

Solution #

Figure 20 Distributed transaction

To achieve exactly-once processing [6], we need to put operations between step 4 to step 6 in one distributed transaction.

Most common technique is two phase commit

Scale the system #

Scale the message queue #

  1. partition key: use ad_id as partition key[7] so that an aggregation service can subscribe to all events of the same ad_id.
  2. number of partitions: if more consumers need to be added, try to do it during off-peak hours to minimize the impact.
  3. Topic physical sharding: We can split the data by geography (topic_north_america, topic_europe, topic_asia, etc.,) or by business type (topic_web_ads, topic_mobile_ads, etc).
    • Pros: increased system throughput, reduced rebalance time.
    • Cons: extra complexity

Scale the aggregation service #

  1. multi-threading: Allocate events with different ad_ids to different threads.

Scale the database #

Cassandra natively supports horizontal scaling,

Hotspot issue #

Aggregaion Service #

some ad_id might receive many more ad click events than others.

Solution: dynamically allocate more node in aggregation service.

Kafka #

The publisher specifies the topic and the partition of a message before publishing. Hence, it’s the publisher’s responsibility to ensure that the partition logic will not result in a hot partition.

Confluent: Monitoring Kafka with JMX

Fault tolerance #

Solution: snapshot aggregaion service to safe current state and event offset

Figure 27 Aggregation node failover

Data monitoring and correctness #

Continuous monitoring #

Reconciliation #

Figure 28 Final design

Reference #

  1. InfluxDB Tops Cassandra in Time Series Data & Metrics Benchmark
  2. Cassandra Time Series Data Modeling For Massive Scale
  3. Scaling Time Series Data Storage — Part I
  4. Apache Cassandra™ 3.x - Storage engine
  5. Feature Deep Dive: Watermarking in Apache Spark Structured Streaming
  6. An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)
  7. Streams and Tables in Apache Kafka: Topics, Partitions, and Storage Fundamentals