Tabla de Contenidos:
  • Part 1. Gentle overview of big data and Spark. What is Apache Spark?
  • A gentle introduction to Spark
  • A tour of Spark's toolset
  • Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview
  • Basic structured operations
  • Working with different types of data
  • Aggregations
  • Joins
  • Data sources
  • Spark SQL
  • Datasets
  • Part 3. Low-level APIs. Resilient distributed datasets (RDDs)
  • Advanced RDDs
  • Distributed shared variables
  • Part 4. Production applications. How Spark runs on a cluster
  • Developing Spark applications
  • Deploying Spark
  • Monitoring and debugging
  • Performance tuning
  • Part 5. Streaming. Stream processing fundamentals
  • Structured streaming basics
  • Event-time and stateful processing
  • Structured streaming in production
  • Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview
  • Preprocessing and feature engineering
  • Classification
  • Regression
  • Recommendation
  • Unsupervised learning
  • Graph analytics
  • Deep learning
  • Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr)
  • Ecosystem and community.