Sliding Window Aggregation
Data Pipeline uses a concept called sliding window aggregation to summarize streaming data inside GroupByReader
. Sliding windows
collect input data while open and emit summary data once closed.
Windows are opened (created) and closed based on configurable strategies that rely on record count, time, record content, or a combination of strategies.
You've already seen the default strategy that uses a single window for the entire dataset. However, you'll have to change strategies if you're dealing with continuous, streaming data or just want to summarize data in batches.