Overview
Apache Spark is a unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R, and an optimized engine for general execution graphs.
Key Features
- Speed: In-memory processing up to 100x faster than Hadoop MapReduce
- Unified Engine: SQL, streaming, ML, and graph processing in one engine
- Multi-Language: APIs for Python, Scala, Java, R, and SQL
- Ecosystem: Rich ecosystem with Spark SQL, MLlib, GraphX, and Structured Streaming