Introduction to Data Data Engineering

Course Duration: 12 weeks

Prerequisites: None

Week 1: Introduction to GCP and Cloud Fundamentals

Practical Topics:

  • Overview of GCP and its role in data engineering.
  • Setting up GCP: Creating projects, managing billing, and using the Console.
  • Introduction to Identity and Access Management (IAM).

Lab Activities:

  • Create a GCP project and configure IAM roles.
  • Set up billing and explore the Console interface.

Week 2: AWS and Azure Comparisons with GCP

Practical Topics:

  • Core services comparison:
    • Storage: Cloud Storage (GCP) vs. S3 (AWS) vs. Blob Storage (Azure).
    • Data Warehousing: BigQuery vs. Redshift vs. Synapse Analytics.
    • Compute: Cloud Functions vs. AWS Lambda vs. Azure Functions.
    • Data Streaming: Pub/Sub vs. Kinesis vs. Event Hub.
  • Use case discussions: When to choose GCP, AWS, or Azure for data engineering.

Lab Activities:

  • Hands-on comparison: Upload and retrieve data from Cloud Storage, S3, and Blob Storage.
  • Explore BigQuery, Redshift, and Synapse interfaces.

Week 2: AWS and Azure Comparisons with GCP

Practical Topics:

  • Core services comparison:
    • Storage: Cloud Storage (GCP) vs. S3 (AWS) vs. Blob Storage (Azure).
    • Data Warehousing: BigQuery vs. Redshift vs. Synapse Analytics.
    • Compute: Cloud Functions vs. AWS Lambda vs. Azure Functions.
    • Data Streaming: Pub/Sub
  • Use case discussions: When to choose GCP, AWS, or Azure for data engineering.

Lab Activities:

  • Hands-on comparison: Upload and retrieve data from Cloud Storage, S3, and Blob Storage.
  • Explore BigQuery, Redshift, and Synapse interfaces.

Week 3: Data Storage on Google Cloud

Practical Topics:

  • Using Google Cloud Storage: Buckets, objects, and lifecycle management.
  • Understanding file formats: CSV, JSON, Avro, Parquet, and ORC.
  • Managing permissions and access policies.

Lab Activities:

  • Set up a Cloud Storage bucket and manage file uploads.
  • Configure lifecycle policies for storage optimization.

Week 4: BigQuery Fundamentals

Practical Topics:

  • Introduction to BigQuery as a serverless data warehouse.
  • Creating datasets, tables, and partitions.
  • Writing basic and advanced SQL queries.

Lab Activities:

  • Load datasets into BigQuery and perform SQL queries.
  • Experiment with partitioning and clustering for performance optimization.

Week 5: Data Ingestion and Transformation on GCP

Practical Topics:

  • Batch ingestion using Cloud Storage and Python.
  • Real-time ingestion with Pub/Sub.
  • ETL/ELT workflows using BigQuery SQL and Cloud Functions.

Lab Activities:

  • Implement a real-time data pipeline with Pub/Sub and BigQuery.
  • Transform raw data into analytics-ready tables using Python and BigQuery.

Week 6: Automating Workflows with Cloud Scheduler and Cloud Functions

Practical Topics:

  • Introduction to Cloud Scheduler for task automation.
  • Using Cloud Functions to trigger workflows.
  • Orchestrating ETL pipelines with Scheduler and Functions.

Lab Activities:

  • Schedule a periodic data load from Cloud Storage to BigQuery.
  • Use Cloud Functions to automate transformations and notifications.

Week 7: Securing Data Pipelines with Secret Manager and IAM

Practical Topics:

  • Managing sensitive credentials with the Secret Manager.
  • Implementing IAM best practices for role-based access control.
  • Securing pipelines using service accounts and policies.

Lab Activities:

  • Store and retrieve secrets in a Cloud Function.
  • Set up granular access controls for a BigQuery dataset.

Week 8: Advanced BigQuery and Optimization Techniques

Practical Topics:

  • BigQuery best practices: Query optimization and cost control.
  • Advanced SQL: Window functions, User-Defined Functions (UDFs).
  • Using materialized views and caching for better performance.

Lab Activities:

  • Optimize queries for a large dataset in BigQuery.
  • Implement advanced transformations using SQL UDFs.

Week 9: Monitoring and Debugging on GCP

Practical Topics:

  • Using Cloud Monitoring and Cloud Logging for data pipelines.
  • Setting up alerts for pipeline failures.
  • Debugging workflows and optimizing performance.

Lab Activities:

  • Monitor a Cloud Function pipeline and analyze logs.
  • Set up an alerting mechanism for BigQuery job failures.

Week 10: Interactive Reporting with Google Data Studio

Practical Topics:

  • Overview of Google Data Studio.
  • Connecting Data Studio to BigQuery for reporting.
  • Designing interactive dashboards with filters, charts, and KPIs.

Lab Activities:

  • Create a Google Data Studio dashboard for a sample e-commerce dataset.
  • Design custom charts and integrate real-time data feeds.

Week 11: Capstone Project – End-to-End Data Pipeline on GCP

Practical Topics:

  • Designing a data pipeline using GCP tools: Cloud Storage, BigQuery, Pub/Sub, Cloud Functions, and Data Studio.
  • Implementing security, automation, and optimization.

Lab Activities:

  • Build a pipeline from ingestion to visualization for a real-world use case (e.g., IoT analytics, financial reporting).

Week 12: Project Presentations and Career Guidance

Practical Topics:

  • Capstone project presentations and feedback.
  • Discussing career paths in data engineering and GCP certifications.
  • Q&A session and roadmap for advanced learning.

Assessment

Participation in Labs and Quizzes (10%)

Weekly Practical Assignments (40%)

Capstone Project (50%)


Need help? Get in Touch With us Today!

Email address

WhatsApp Me

Phone number

+31 (0)626673133

Address

Leonard Bernteinstraat, Almere Netherlands