Understanding a number of the key characteristics to contemplate when evaluating and comparing streaming technologies.

As data architectures have gotten an increasing number of mature, streaming is not any longer considered a luxury but a technology with a big selection of applications across different industries. Due to technical and resource limitations, batch processing was the truth is at all times the popular strategy to process and deliver applications, although with the event of micro-batch and native streaming frameworks in distributed systems based on Apache, high-scale streaming has now change into far more accessible (Figure 1).
Some example applications for using streaming systems, will be processing: transaction data to identify anomalies, weather data, IoT data from distant locations, geo-location tracking, etc.
There are two key sorts of streaming processing systems: micro-batch and real-time:
- In real-time streaming processing, each record is processed as soon because it becomes available. This may subsequently end in systems with a really low latency, able to make immediate use of the incoming data (e.g. detecting fraudulent transactions in financial systems).
- In micro-batch processing systems, data points are as a substitute not processed one after the other but in small blocks after which returned after specific time intervals or once reached a maximum storage size. The sort of approach favors subsequently high throughput over low latency. Finally, micro-batch systems will be particularly useful if all in favour of performing complex operations similar to aggregates (e.g. min, max, mean), joins, etc… on the fly before outputting the leads to a storage system. Micro batch processing can subsequently be considered a superb compromise between pure streaming and batch when performing for instance hourly reporting tasks (e.g. mean weather temperature, etc.).