Curso Spark Optimization

  • Web UX Design Era Digital

Curso Spark Optimization

24 Horas
Visão Geral

Curso Spark Optimization, Este curso de treinamento de Otimização do Spark foi desenvolvido para cobrir níveis avançados do Spark para ajustar aplicativos.  

O Curso Spark Optimization começa com uma revisão do Spark, incluindo arquitetura, termos e uso do Hadoop com Spark. A partir daí, os alunos aprenderão sobre o ambiente de execução do Spark e o YARN; como trabalhar com o formato de dados correto; e lidar com partições Spark. O curso termina explorando a execução física do Spark, usando a API Spark Core, cache e checkpoints, junções e otimização.  

O Curso Spark Optimization é oferecido nas linguagens de programação Python/Scala.

Objetivo

Após este Curso Spark Optimization, você será capaz de:

  1. Integrar aspectos do Spark no YARN
  2. Lidar com formatos de dados binários
  3. Identifique as partes internas do Spark
  4. Otimizar Spark Core e Spark SQL Code
  5. Discutir as práticas recomendadas ao escrever Spark Core e Spark SQL Code
Materiais
Português/Inglês + Exercícios + Lab Pratico
Conteúdo Programatico

Spark Overview 

  1. Logical Architecture
  2.  Physical Architecture of Spark
  3.  Common Concepts and Terms in Spark
  4.  Ways to build applications on Spark
  5.  Spark with Hadoop

Understanding Spark Execution Environment – YARN 

  1. About YARN
  2.  Why YARN
  3.  Architecture of YARN
  4.  YARN UI and Commands
  5.  Internals of YARN
  6.  Experience execution of Spark application on YARN
  7.  Troubleshooting and Debugging Spark applications on YARN
  8.  Optimizing Application Performance

Working with Right Data Format 

  1. Why Data Formats are important for optimization
  2.  Key Data Formats
  3.  Comparisons – which one to choose when?
  4.  Working with Avro
  5.  Working with Parquet
  6.  Working with ORC

Dealing with Spark Partitions 

  1. How Spark determines number of Partitions
  2. Things to keep in mind while determining Partition
  3.  Small Partitions Problem
  4.  Diagnosing & Handling Post Filtering Issues (Skewness)
  5.  Repartition vs Coalesce

Spark Physical Execution 

  1. Spark Core Plan
  2.  Modes of Execution
  3.  YARN Client vs YARN Cluster
  4.  Standalone Mode
  5.  Physical Execution on Cluster
  6.  Narrow vs Wide Dependency
  7.  Spark UI
  8.  Executor Memory Architecture
  9.  Key Properties

Effective Development Using Spark Core API 

  1. Use of groupbykey and reducebykey
  2.  Using the right datatype in RDD
  3.  How to ensure memory is utilized effectively?
  4.  Performing Data Validation in an optimal manner
  5.  Use of mapPartitions
  6.  Partitioning Strategies
  7.  Hash Partitioner
  8.  Use of Range Partitioner
  9.  Writing and plugging custom partitioner

Caching and Checkpointing 

  1. When to Cache?
  2.  How Caching helps?
  3.  Caching Strategies
  4.  How Spark plans changes when Caching is on
  5.  Caching on Spark UI
  6.  Role of Alluxio
  7.  Checkpointing
  8.  How Caching is different from Checkpointing

Joins 

  1. Why optimizing joins is important
  2.  Types of Joins
  3.  Quick Recap of MapReduce MapSide Joins
  4.  Broadcasting
  5.  Bucketing

Spark SQL Optimization 

  1. Dataframes vs Datasets
  2.  About Tungsten
  3.  Data Partitioning
  4.  Query Optimizer: Catalyst Optimizer
  5.  Debugging Spark Queries
  6.  Explain Plan
  7.  Partitioning & Bucketing in Spark SQL
  8.  Best Practices for writing Spark SQL code
  9.  Spark SQL with Binary Data formats
TENHO INTERESSE

Cursos Relacionados

Curso Adobe Captivate Foundation

16 horas

Curso Adobe Captivate Advanced

16 horas

Curso Algoritmos Visual Foundation

16 horas

Curso Web Performance and Optimisation

Curso Web Performance and Optimisation

Curso Adobe Animate Foundation

16 horas

Curso PHP Developer Foundation

32 horas