Batch/Adjacent Sliding Windows

The first configuration described here allows you to obtain regular summaries from fixed numbers of records. This effectively simulates the output you receive in batch-only environments.

When GroupByReader starts it perform the following actions:

  1. Opens single window until its accepted a set number of records
  2. Closes the window and emits the summary data to the next step in the pipeline
  3. Opens a new window and starts collecting again

The following example uses a create strategy that opens a new window only if one isn't already open (line 8). The close strategy caps the window to 50 records at most.

The example also turns on the debug flag to log when windows are opened and closed.

While it's possible for a group operator (like sum or max) to hold onto the actual records while its window is open, none of the built-in operators work that way. This allow windows to be fairly memory-cheap, which is important as you'll see in the next section.

Mobile Analytics