
As the world gets more instrumented and connected, we are witnessing a flood of digital data generated from various hardware (e.g., sensors) or software in the format of flowing streams of data. Real-time processing for such massive amounts of streaming data is a crucial requirement in several application domains including financial markets, surveillance systems, manufacturing, smart cities, and scalable monitoring infrastructure. In the last few years, several big stream processing engines have been introduced to tackle this challenge. In this project, we implemented a framework for benchmarking five popular systems in this domain, namely, Apache Storm, Apache Flink, Apache Spark, Kafka Streams and Hazelcast Jet. The framework is designed to cover the end-to-end benchmarking process and it is quite flexible to be extended for benchmarking additional metrics or to include additional comparison metrics.
-
Elkhan Shahverdi, Ahmed Awad, Sherif Sakr. Big Stream Processing Systems: An Experimental Evaluation. DASC 2019 workshop, held on April 8th within the 35th IEEE International Conference on Data Engineering (ICDE 2019)