| 1 |
Introduction to Data Engineering (Data Engineering Basics) |
|
| 2 |
Understanding Data & Databases (Relational vs Non-Relational) |
|
| 3 |
Programming Basics (Python, Java, or Scala) |
|
| 4 |
Data Structures & Algorithms Basics |
|
| 5 |
SQL Basics (Data Querying) |
|
| 6 |
Advanced SQL (Joins, Window Functions, CTEs, Indexing) |
|
| 7 |
NoSQL Databases (MongoDB, Cassandra, Redis) |
|
| 8 |
Data Modeling & Schema Design (Star Schema, Snowflake Schema, Normalization) |
|
| 9 |
ETL Concepts (Extract, Transform, Load) |
|
| 10 |
ETL Tools (Apache Nifi, Talend, Informatica) |
|
| 11 |
Data Pipelines (Batch & Real-time Processing) |
|
| 12 |
Big Data Fundamentals (Hadoop Ecosystem Overview) |
|
| 13 |
HDFS (Hadoop Distributed File System) |
|
| 14 |
MapReduce Basics |
|
| 15 |
Apache Spark Basics (RDDs, DataFrames, Spark SQL) |
|
| 16 |
Spark Advanced (Streaming, MLlib, Structured Streaming) |
|
| 17 |
Data Warehousing Concepts (OLAP vs OLTP) |
|
| 18 |
Data Warehouses (Redshift, Snowflake, BigQuery) |
|
| 19 |
Columnar Storage & Partitioning |
|
| 20 |
Data Lakes & Lakehouses (S3, Delta Lake, Databricks) |
|
| 21 |
Data Governance & Metadata Management |
|
| 22 |
Data Quality & Validation |
|
| 23 |
Streaming Data & Messaging Systems (Kafka, RabbitMQ, Kinesis) |
|
| 24 |
Real-time Data Processing Concepts |
|
| 25 |
Apache Flink / Spark Streaming |
|
| 26 |
Workflow Orchestration Tools (Apache Airflow, Luigi, Prefect) |
|
| 27 |
Scheduling & Monitoring Pipelines |
|
| 28 |
Cloud Platforms (AWS, GCP, Azure) |
|
| 29 |
AWS Data Services (S3, Redshift, Glue, EMR, Kinesis) |
|
| 30 |
GCP Data Services (BigQuery, Dataflow, Pub/Sub) |
|
| 31 |
Azure Data Services (Azure Data Lake, Synapse Analytics, Event Hubs) |
|
| 32 |
Containerization & Orchestration (Docker, Kubernetes) |
|
| 33 |
Infrastructure as Code (Terraform, CloudFormation) |
|
| 34 |
APIs & Data Integration (REST APIs, GraphQL, gRPC) |
|
| 35 |
Data Security & Compliance (Encryption, GDPR, HIPAA, CCPA) |
|
| 36 |
Monitoring & Logging (Prometheus, Grafana, ELK Stack) |
|
| 37 |
Performance Tuning (Query Optimization, Partitioning, Caching) |
|
| 38 |
Version Control (Git, GitHub/GitLab) |
|
| 39 |
Collaboration & Best Practices in Data Engineering |
|
| 40 |
Continuous Learning (Advanced Spark, Cloud-Native Data Engineering, ML Pipelines, DataOps) |
|