Curso Build Data Pipelines with Lakefow Spark Declarative Pipelines

  • Tableau Data Visualization

Curso Build Data Pipelines with Lakefow Spark Declarative Pipelines

24 horas
Visão Geral

Este curso capacita profissionais a projetar, desenvolver, implantar e monitorar pipelines de dados modernos utilizando Lakeflow Declarative Pipelines na plataforma Databricks. Os participantes aprenderão a construir pipelines escaláveis e confiáveis para processamento batch e streaming, aplicando conceitos de arquitetura Lakehouse, Delta Lake, Data Quality, Data Lineage e automação de workflows.

O treinamento aborda desde os fundamentos do Lakeflow até cenários avançados de engenharia de dados, incluindo ingestão contínua, transformações declarativas, monitoramento operacional e implementação de arquiteturas Bronze, Silver e Gold.

Objetivo

Após realizar este curso Build Data Pipelines with Lakeflow Declarative Pipelines, você será capaz de:

  • Compreender a arquitetura do Lakeflow Declarative Pipelines
  • Criar pipelines declarativos para processamento de dados
  • Implementar ingestão batch e streaming
  • Utilizar Auto Loader para ingestão automatizada
  • Construir arquiteturas Bronze, Silver e Gold
  • Aplicar regras de qualidade de dados
  • Monitorar e otimizar pipelines
  • Implementar Data Lineage e observabilidade
  • Automatizar processos de engenharia de dados
  • Aplicar boas práticas de governança e performance
Publico Alvo
  • Data Engineers
  • Analytics Engineers
  • Data Architects
  • Data Platform Engineers
  • ETL Developers
  • BI Developers
  • Profissionais que trabalham com Databricks Lakehouse Platform
Pre-Requisitos
  • Conhecimentos básicos de SQL
  • Conhecimentos básicos de Python
  • Conceitos de Data Warehousing
  • Conhecimentos de Apache Spark
  • Familiaridade com Delta Lake
  • Experiência básica com Databricks
Materiais
Inglês/Português + Exercícios + Lab Pratico
Conteúdo Programatico

Module 1: Introduction to Lakeflow Declarative Pipelines

  1. Overview of Lakeflow
  2. Evolution from Delta Live Tables
  3. Declarative Pipeline Concepts
  4. Lakehouse Architecture
  5. Pipeline Components
  6. Pipeline Lifecycle
  7. Lakeflow Use Cases

Module 2: Databricks Lakehouse Fundamentals

  1. Lakehouse Architecture Review
  2. Delta Lake Fundamentals
  3. Unity Catalog Overview
  4. Data Governance Concepts
  5. Data Lineage Introduction
  6. Medallion Architecture
  7. Best Practices

Module 3: Creating Your First Pipeline

  1. Pipeline Creation
  2. Defining Datasets
  3. Pipeline Configuration
  4. Development Workflow
  5. Execution Model
  6. Dependency Resolution
  7. Pipeline Deployment

Module 4: Declarative Transformations

  1. SQL-Based Transformations
  2. Python-Based Transformations
  3. Dataset Definitions
  4. Materialized Views
  5. Streaming Tables
  6. Incremental Processing
  7. Transformation Patterns

Module 5: Data Ingestion with Auto Loader

  1. Auto Loader Fundamentals
  2. File Discovery Mechanisms
  3. Schema Inference
  4. Schema Evolution
  5. Incremental Ingestion
  6. Cloud Storage Integration
  7. Monitoring Ingestion Processes

Module 6: Streaming Data Pipelines

  1. Structured Streaming Review
  2. Streaming Tables
  3. Real-Time Data Processing
  4. Watermarking Concepts
  5. Checkpoint Management
  6. Streaming Optimization
  7. Operational Monitoring

Module 7: Data Quality Management

  1. Expectations Framework
  2. Data Validation Rules
  3. Quality Constraints
  4. Data Quality Metrics
  5. Handling Invalid Records
  6. Data Quality Monitoring
  7. Quality Best Practices

Module 8: Medallion Architecture Implementation

  1. Bronze Layer Design
  2. Silver Layer Design
  3. Gold Layer Design
  4. Incremental Data Processing
  5. Data Enrichment
  6. Business Aggregations
  7. End-to-End Data Flow

Module 9: Pipeline Monitoring and Observability

  1. Pipeline Event Logs
  2. Monitoring Dashboards
  3. Execution Metrics
  4. Troubleshooting Pipelines
  5. Alerting Mechanisms
  6. Lineage Visualization
  7. Operational Best Practices

Module 10: Pipeline Optimization

  1. Performance Tuning
  2. Resource Management
  3. Cluster Optimization
  4. Storage Optimization
  5. Query Performance
  6. Cost Optimization
  7. Scalability Considerations

Module 11: Governance and Security

  1. Unity Catalog Integration
  2. Access Control
  3. Data Permissions
  4. Auditing
  5. Compliance Requirements
  6. Data Sharing
  7. Security Best Practices

Module 12: Production Deployment

  1. CI/CD Concepts
  2. Source Control Integration
  3. Deployment Strategies
  4. Environment Promotion
  5. Operational Runbooks
  6. Maintenance Procedures
  7. Production Best Practices

Laboratórios Práticos

Lab 1: Creating Your First Lakeflow Pipeline

  1. Configure Workspace
  2. Create Pipeline
  3. Define Source Data
  4. Execute Pipeline
  5. Analyze Results

Lab 2: Building Bronze Layer

  1. Configure Auto Loader
  2. Create Bronze Tables
  3. Implement Incremental Loads
  4. Validate Data Ingestion

Lab 3: Building Silver Layer

  1. Data Cleansing
  2. Schema Standardization
  3. Data Enrichment
  4. Data Quality Validation

Lab 4: Building Gold Layer

  1. Business Aggregations
  2. KPI Calculations
  3. Analytical Data Models
  4. Reporting Datasets

Lab 5: Implementing Data Quality Rules

  1. Create Expectations
  2. Validate Incoming Data
  3. Handle Failed Records
  4. Monitor Quality Metrics

Lab 6: Streaming Pipeline Implementation

  1. Configure Streaming Sources
  2. Create Streaming Tables
  3. Process Real-Time Events
  4. Monitor Stream Health

Lab 7: Monitoring and Troubleshooting

  1. Analyze Pipeline Events
  2. Investigate Failures
  3. Review Lineage
  4. Resolve Performance Issues

Lab 8: End-to-End Medallion Project

  1. Ingest Raw Data
  2. Create Bronze Layer
  3. Create Silver Layer
  4. Create Gold Layer
  5. Implement Data Quality Controls
  6. Configure Monitoring
  7. Optimize Pipeline Performance
  8. Publish Production-Ready Pipeline

Projeto Final

Desenvolvimento completo de uma plataforma de dados baseada em Lakehouse utilizando Lakeflow Declarative Pipelines, contemplando ingestão automática, processamento incremental, arquitetura Medallion, governança de dados, monitoramento operacional e otimização para ambiente produtivo.

TENHO INTERESSE

Cursos Relacionados

Curso Análise de Dados Com o Power BI - 20778B

24 horas

Curso Análise de dados Excel Com Power BI - 20779B

16 horas

Curso Talend Data Integration Foundation

16 horas

Curso Talend Data Integration Advanced

16 horas

Curso PowerApps with SAP Integration

24 horas