About 55,000 results
Open links in new tab
  1. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  2. Quick Start - Spark 4.1.0 Documentation

    Quick Start Interactive Analysis with the Spark Shell Basics More on Dataset Operations Caching Self-Contained Applications Where to Go from Here This tutorial provides a quick introduction to using …

  3. PySpark Overview — PySpark 4.1.0 documentation - Apache Spark

    Dec 11, 2025 · PySpark Overview # Date: Dec 11, 2025 Version: 4.1.0 Useful links: Live Notebook | GitHub | Issues | Examples | Community | Stack Overflow | Dev Mailing List | User Mailing List …

  4. Structured Streaming Programming Guide - Spark 4.1.0 Documentation

    Structured Streaming Programming Guide API using Datasets and DataFrames Since Spark 2.0, DataFrames and Datasets can represent static, bounded data, as well as streaming, unbounded data.

  5. Performance Tuning - Spark 4.1.0 Documentation

    Performance Tuning Spark offers many techniques for tuning the performance of DataFrame or SQL workloads. Those techniques, broadly speaking, include caching data, altering how datasets are …

  6. Spark 4.0.0 released - Apache Spark

    Spark 4.0.0 released We are happy to announce the availability of Spark 4.0.0! Visit the release notes to read about the new features, or download the release today. Spark News Archive

  7. Spark Release 4.0.0 - Apache Spark

    Spark Release 4.0.0 Apache Spark 4.0.0 marks a significant milestone as the inaugural release in the 4.x series, embodying the collective effort of the vibrant open-source community. This release is a …

  8. Building Spark - Spark 4.0.0 Documentation

    Building Apache Spark Apache Maven The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.9.9 and Java 17/21. Spark requires Scala 2.13; …

  9. Application Development with Spark Connect - Spark 4.1.0 …

    Application Development with Spark Connect Spark Connect Overview In Apache Spark 3.4, Spark Connect introduced a decoupled client-server architecture that allows remote connectivity to Spark …

  10. Structured Streaming Programming Guide - Spark 4.1.0 Documentation

    Since Spark 2.1, we have support for watermarking which allows the user to specify the threshold of late data, and allows the engine to accordingly clean up old state.