Keyed Watermarks: A Partition-aware Watermark Generation in Apache Flink


Big stream processing frameworks such as Apache Flink [1,2] provide rich APIs to build real-time data analytics applications. Theoretically, unbounded streams of data flow into the system and results are calculated. To tackle the unbounded nature, streams are divided into chunks by means of window operators. A window can be seen as a way to take a snapshot of the stream and apply a user-defined logic on its contents.

Window operators are basically concerned with slicing the continuous stream based on time. Moreover, these time-based windows can be further sub-divided by partitioning them based on a (group) of data items in the stream. The common notions of time supported are processing time and event time. In the former case, the data arriving at the source of a stream processing application are timestamped by the wall-clock time of the streaming engine. In this case, the timestamps are in order. In the latter case, event-time processing, the timestamp is assigned by the data generator and application developer has to provide a means to extract its value from the data. In this case, the control of time progress is outside the control of the engine. Moreover, it is common that data might arrive out-of-order [3]. Out of order means that some events can arrive later than expected with respect to its event time. A common approach to let stream processing engines reason about the progress of event time is low  watermarks [4]. A watermark is merely a timestamp. When a watermark is received by an operator. It means that we all data with a timestamp less than received watermark’s value should have been received by the system. Thus, any pending windows whose end timestamp is before the watermark value can be closed and their user-defined logic can be run. The technique of watermarking is supported by Apache Flink.

Currently, watermarks for a specific data source is partition-agnostic. That is, all logical windows, partitioned by a group of elements, have to be processed on the arrival of the next watermark. Considering an IoT applications, we might have several devices that emit data but at different rates. So, it does not make sense to data items from high-rate devices waiting for watermarks that might progress relatively slowly due to low-rate devices. We need to implement a partition-aware watermark processing approach through out all window operators in a Flink job thus, windows using the same partition-key as the watermark can progress independently.


[1] Apache Flink:

[2] Carbone, Paris, et al. "Apache flink: Stream and batch processing in a single engine." Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36.4 (2015).

[3] Li, Jin, et al. "Out-of-order processing: a new architecture for high-performance stream systems." Proceedings of the VLDB Endowment 1.1 (2008): 274-288.

[4] Akidau, Tyler, et al. "MillWheel: fault-tolerant stream processing at internet scale." Proceedings of the VLDB Endowment 6.11 (2013): 1033-1044.