New Java Data Platforms 2026
last commit 10 months ago alluxio/alluxio 7K +8
added 8 months ago
Alluxio Open Source (formerly known as Tachyon) is a Distributed Caching Platform for large-scale data.
last commit 19 hours ago apache/seatunnel 9K +26
added 9 months ago
A high-performance, distributed data integration tool, capable of synchronizing vast amounts of data daily.
last commit 8 months ago alibaba/datax 17K +14
added 11 months ago
DataX is the open source data integration framework maintained by Alibaba. As a data synchronization framework, DataX abstracts the synchronization of different data sources.
last commit 4 days ago elastic/logstash 14K +14
added 11 months ago
Logstash is a server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."
last commit 5 hours ago apache/pulsar 15K +28
added 11 months ago
Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
last commit 1 week ago apache/rocketmq 22K +15
added 12 months ago
Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
last commit 1 day ago apache/spark 42K +68
added 1 year ago
Apache Spark - A unified analytics engine for large-scale data processing.
last commit 4 days ago apache/systemds 1K +1
added 1 year ago
An open source ML system for the end-to-end data science lifecycle
last commit 1 day ago apache/streampipes 715 +1
added 1 year ago
A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
last commit 3 days ago apache/nifi 5K +13
added 1 year ago
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.
last commit 1 day ago apache/flink 25K +22
added 1 year ago
A stream processing framework with powerful stream- and batch-processing capabilities.
last commit 3 days ago infinispan/infinispan 1K
added 1 year ago
An open source data grid platform and highly scalable NoSQL cloud data store.
last commit 3 days ago apache/ignite 5K +7
added 1 year ago
Apache Ignite is a distributed database for high-performance computing with in-memory speed.
last commit 1 day ago apache/kafka 32K +90
added 1 year ago
Distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
last commit 3 days ago hazelcast/hazelcast 6K -6
added 1 year ago