Introduction to Data Data Engineering
Course Duration: 12 weeks
Prerequisites: None
Week 1: Introduction to GCP and Cloud Fundamentals
Practical Topics:
- Overview of GCP and its role in data engineering.
- Setting up GCP: Creating projects, managing billing, and using the Console.
- Introduction to Identity and Access Management (IAM).
Lab Activities:
- Create a GCP project and configure IAM roles.
- Set up billing and explore the Console interface.
Week 2: AWS and Azure Comparisons with GCP
Practical Topics:
- Core services comparison:
- Storage: Cloud Storage (GCP) vs. S3 (AWS) vs. Blob Storage (Azure).
- Data Warehousing: BigQuery vs. Redshift vs. Synapse Analytics.
- Compute: Cloud Functions vs. AWS Lambda vs. Azure Functions.
- Data Streaming: Pub/Sub vs. Kinesis vs. Event Hub.
- Use case discussions: When to choose GCP, AWS, or Azure for data engineering.
Lab Activities:
- Hands-on comparison: Upload and retrieve data from Cloud Storage, S3, and Blob Storage.
- Explore BigQuery, Redshift, and Synapse interfaces.
Week 2: AWS and Azure Comparisons with GCP
Practical Topics:
- Core services comparison:
- Storage: Cloud Storage (GCP) vs. S3 (AWS) vs. Blob Storage (Azure).
- Data Warehousing: BigQuery vs. Redshift vs. Synapse Analytics.
- Compute: Cloud Functions vs. AWS Lambda vs. Azure Functions.
- Data Streaming: Pub/Sub
- Use case discussions: When to choose GCP, AWS, or Azure for data engineering.
Lab Activities:
- Hands-on comparison: Upload and retrieve data from Cloud Storage, S3, and Blob Storage.
- Explore BigQuery, Redshift, and Synapse interfaces.
Week 3: Data Storage on Google Cloud
Practical Topics:
- Using Google Cloud Storage: Buckets, objects, and lifecycle management.
- Understanding file formats: CSV, JSON, Avro, Parquet, and ORC.
- Managing permissions and access policies.
Lab Activities:
- Set up a Cloud Storage bucket and manage file uploads.
- Configure lifecycle policies for storage optimization.
Week 4: BigQuery Fundamentals
Practical Topics:
- Introduction to BigQuery as a serverless data warehouse.
- Creating datasets, tables, and partitions.
- Writing basic and advanced SQL queries.
Lab Activities:
- Load datasets into BigQuery and perform SQL queries.
- Experiment with partitioning and clustering for performance optimization.
Week 5: Data Ingestion and Transformation on GCP
Practical Topics:
- Batch ingestion using Cloud Storage and Python.
- Real-time ingestion with Pub/Sub.
- ETL/ELT workflows using BigQuery SQL and Cloud Functions.
Lab Activities:
- Implement a real-time data pipeline with Pub/Sub and BigQuery.
- Transform raw data into analytics-ready tables using Python and BigQuery.
Week 6: Automating Workflows with Cloud Scheduler and Cloud Functions
Practical Topics:
- Introduction to Cloud Scheduler for task automation.
- Using Cloud Functions to trigger workflows.
- Orchestrating ETL pipelines with Scheduler and Functions.
Lab Activities:
- Schedule a periodic data load from Cloud Storage to BigQuery.
- Use Cloud Functions to automate transformations and notifications.
Week 7: Securing Data Pipelines with Secret Manager and IAM
Practical Topics:
- Managing sensitive credentials with the Secret Manager.
- Implementing IAM best practices for role-based access control.
- Securing pipelines using service accounts and policies.
Lab Activities:
- Store and retrieve secrets in a Cloud Function.
- Set up granular access controls for a BigQuery dataset.
Week 8: Advanced BigQuery and Optimization Techniques
Practical Topics:
- BigQuery best practices: Query optimization and cost control.
- Advanced SQL: Window functions, User-Defined Functions (UDFs).
- Using materialized views and caching for better performance.
Lab Activities:
- Optimize queries for a large dataset in BigQuery.
- Implement advanced transformations using SQL UDFs.
Week 9: Monitoring and Debugging on GCP
Practical Topics:
- Using Cloud Monitoring and Cloud Logging for data pipelines.
- Setting up alerts for pipeline failures.
- Debugging workflows and optimizing performance.
Lab Activities:
- Monitor a Cloud Function pipeline and analyze logs.
- Set up an alerting mechanism for BigQuery job failures.
Week 10: Interactive Reporting with Google Data Studio
Practical Topics:
- Overview of Google Data Studio.
- Connecting Data Studio to BigQuery for reporting.
- Designing interactive dashboards with filters, charts, and KPIs.
Lab Activities:
- Create a Google Data Studio dashboard for a sample e-commerce dataset.
- Design custom charts and integrate real-time data feeds.
Week 11: Capstone Project – End-to-End Data Pipeline on GCP
Practical Topics:
- Designing a data pipeline using GCP tools: Cloud Storage, BigQuery, Pub/Sub, Cloud Functions, and Data Studio.
- Implementing security, automation, and optimization.
Lab Activities:
- Build a pipeline from ingestion to visualization for a real-world use case (e.g., IoT analytics, financial reporting).
Week 12: Project Presentations and Career Guidance
Practical Topics:
- Capstone project presentations and feedback.
- Discussing career paths in data engineering and GCP certifications.
- Q&A session and roadmap for advanced learning.
Assessment
Participation in Labs and Quizzes (10%)
Weekly Practical Assignments (40%)
Capstone Project (50%)
Need help? Get in Touch With us Today!
Email address
info@cloudparrots.com
WhatsApp Me
Phone number
+31 (0)626673133
Address
Leonard Bernteinstraat, Almere Netherlands