Data Engineering

# Topic Completed (%)
1 Introduction to Data Engineering (Data Engineering Basics)
2 Understanding Data & Databases (Relational vs Non-Relational)
3 Programming Basics (Python, Java, or Scala)
4 Data Structures & Algorithms Basics
5 SQL Basics (Data Querying)
6 Advanced SQL (Joins, Window Functions, CTEs, Indexing)
7 NoSQL Databases (MongoDB, Cassandra, Redis)
8 Data Modeling & Schema Design (Star Schema, Snowflake Schema, Normalization)
9 ETL Concepts (Extract, Transform, Load)
10 ETL Tools (Apache Nifi, Talend, Informatica)
11 Data Pipelines (Batch & Real-time Processing)
12 Big Data Fundamentals (Hadoop Ecosystem Overview)
13 HDFS (Hadoop Distributed File System)
14 MapReduce Basics
15 Apache Spark Basics (RDDs, DataFrames, Spark SQL)
16 Spark Advanced (Streaming, MLlib, Structured Streaming)
17 Data Warehousing Concepts (OLAP vs OLTP)
18 Data Warehouses (Redshift, Snowflake, BigQuery)
19 Columnar Storage & Partitioning
20 Data Lakes & Lakehouses (S3, Delta Lake, Databricks)
21 Data Governance & Metadata Management
22 Data Quality & Validation
23 Streaming Data & Messaging Systems (Kafka, RabbitMQ, Kinesis)
24 Real-time Data Processing Concepts
25 Apache Flink / Spark Streaming
26 Workflow Orchestration Tools (Apache Airflow, Luigi, Prefect)
27 Scheduling & Monitoring Pipelines
28 Cloud Platforms (AWS, GCP, Azure)
29 AWS Data Services (S3, Redshift, Glue, EMR, Kinesis)
30 GCP Data Services (BigQuery, Dataflow, Pub/Sub)
31 Azure Data Services (Azure Data Lake, Synapse Analytics, Event Hubs)
32 Containerization & Orchestration (Docker, Kubernetes)
33 Infrastructure as Code (Terraform, CloudFormation)
34 APIs & Data Integration (REST APIs, GraphQL, gRPC)
35 Data Security & Compliance (Encryption, GDPR, HIPAA, CCPA)
36 Monitoring & Logging (Prometheus, Grafana, ELK Stack)
37 Performance Tuning (Query Optimization, Partitioning, Caching)
38 Version Control (Git, GitHub/GitLab)
39 Collaboration & Best Practices in Data Engineering
40 Continuous Learning (Advanced Spark, Cloud-Native Data Engineering, ML Pipelines, DataOps)