Visão Geral
Este curso capacita profissionais a projetar, desenvolver, implantar e monitorar pipelines de dados modernos utilizando Lakeflow Declarative Pipelines na plataforma Databricks. Os participantes aprenderão a construir pipelines escaláveis e confiáveis para processamento batch e streaming, aplicando conceitos de arquitetura Lakehouse, Delta Lake, Data Quality, Data Lineage e automação de workflows.
O treinamento aborda desde os fundamentos do Lakeflow até cenários avançados de engenharia de dados, incluindo ingestão contínua, transformações declarativas, monitoramento operacional e implementação de arquiteturas Bronze, Silver e Gold.
Conteúdo Programatico
Module 1: Introduction to Lakeflow Declarative Pipelines
- Overview of Lakeflow
- Evolution from Delta Live Tables
- Declarative Pipeline Concepts
- Lakehouse Architecture
- Pipeline Components
- Pipeline Lifecycle
- Lakeflow Use Cases
Module 2: Databricks Lakehouse Fundamentals
- Lakehouse Architecture Review
- Delta Lake Fundamentals
- Unity Catalog Overview
- Data Governance Concepts
- Data Lineage Introduction
- Medallion Architecture
- Best Practices
Module 3: Creating Your First Pipeline
- Pipeline Creation
- Defining Datasets
- Pipeline Configuration
- Development Workflow
- Execution Model
- Dependency Resolution
- Pipeline Deployment
Module 4: Declarative Transformations
- SQL-Based Transformations
- Python-Based Transformations
- Dataset Definitions
- Materialized Views
- Streaming Tables
- Incremental Processing
- Transformation Patterns
Module 5: Data Ingestion with Auto Loader
- Auto Loader Fundamentals
- File Discovery Mechanisms
- Schema Inference
- Schema Evolution
- Incremental Ingestion
- Cloud Storage Integration
- Monitoring Ingestion Processes
Module 6: Streaming Data Pipelines
- Structured Streaming Review
- Streaming Tables
- Real-Time Data Processing
- Watermarking Concepts
- Checkpoint Management
- Streaming Optimization
- Operational Monitoring
Module 7: Data Quality Management
- Expectations Framework
- Data Validation Rules
- Quality Constraints
- Data Quality Metrics
- Handling Invalid Records
- Data Quality Monitoring
- Quality Best Practices
Module 8: Medallion Architecture Implementation
- Bronze Layer Design
- Silver Layer Design
- Gold Layer Design
- Incremental Data Processing
- Data Enrichment
- Business Aggregations
- End-to-End Data Flow
Module 9: Pipeline Monitoring and Observability
- Pipeline Event Logs
- Monitoring Dashboards
- Execution Metrics
- Troubleshooting Pipelines
- Alerting Mechanisms
- Lineage Visualization
- Operational Best Practices
Module 10: Pipeline Optimization
- Performance Tuning
- Resource Management
- Cluster Optimization
- Storage Optimization
- Query Performance
- Cost Optimization
- Scalability Considerations
Module 11: Governance and Security
- Unity Catalog Integration
- Access Control
- Data Permissions
- Auditing
- Compliance Requirements
- Data Sharing
- Security Best Practices
Module 12: Production Deployment
- CI/CD Concepts
- Source Control Integration
- Deployment Strategies
- Environment Promotion
- Operational Runbooks
- Maintenance Procedures
- Production Best Practices
Laboratórios Práticos
Lab 1: Creating Your First Lakeflow Pipeline
- Configure Workspace
- Create Pipeline
- Define Source Data
- Execute Pipeline
- Analyze Results
Lab 2: Building Bronze Layer
- Configure Auto Loader
- Create Bronze Tables
- Implement Incremental Loads
- Validate Data Ingestion
Lab 3: Building Silver Layer
- Data Cleansing
- Schema Standardization
- Data Enrichment
- Data Quality Validation
Lab 4: Building Gold Layer
- Business Aggregations
- KPI Calculations
- Analytical Data Models
- Reporting Datasets
Lab 5: Implementing Data Quality Rules
- Create Expectations
- Validate Incoming Data
- Handle Failed Records
- Monitor Quality Metrics
Lab 6: Streaming Pipeline Implementation
- Configure Streaming Sources
- Create Streaming Tables
- Process Real-Time Events
- Monitor Stream Health
Lab 7: Monitoring and Troubleshooting
- Analyze Pipeline Events
- Investigate Failures
- Review Lineage
- Resolve Performance Issues
Lab 8: End-to-End Medallion Project
- Ingest Raw Data
- Create Bronze Layer
- Create Silver Layer
- Create Gold Layer
- Implement Data Quality Controls
- Configure Monitoring
- Optimize Pipeline Performance
- Publish Production-Ready Pipeline
Projeto Final
Desenvolvimento completo de uma plataforma de dados baseada em Lakehouse utilizando Lakeflow Declarative Pipelines, contemplando ingestão automática, processamento incremental, arquitetura Medallion, governança de dados, monitoramento operacional e otimização para ambiente produtivo.