No pineapple on pizza! Streaming anomaly detection with Apache Kafka® and Apache Flink®

August 2, 2022

A streaming anomaly detection system with Apache Kafka® and Apache Flink®

Intro image

Abstract

There’s a rule in Italy that states: “pineapple doesn’t belong to pizza”. Yet it’s a common choice around the world and a big discussion topic online.

We’ll use this funny example to show the power of the best streaming open source duo: Apache Kafka and Flink. We will initially showcase how data can flow in streaming mode through Kafka topics, and then add Flink on top to detect anomalies (yep pineapple, I’m looking at you), calculate aggregations, and enrich our pipelines with data coming from external systems like a PostgreSQL database.

If you want to see the creation of a streaming data pipeline for anomaly detection in 10 minutes, this talk is for you.

Useful Links

GitHub repository containing all the code
Dockerized Fake Data Producer for Apache Kafka®
Apache Flink® SQL documentation
Apache Flink® MATCH_RECOGNIZE documentation

💻