Data Science Curriculum

12 week Course Curriculum

Here’s a 3 month Cloud Computing curriculum designed in the form basics to advanced, with weekly topics, monthly tests, assignments, and projects, aligned to current industry standards (AWS centric, with optional Azure/GCP exposure).

Assumptions :
4–5 hours per day, 5 day weeks → 12 weeks ≈ 3 months.
Platform :
AWS (students can optionally add Azure/GCP later).
Tools :
AWS free tier, EC2, S3, RDS, Lambda, IAM, VPC, CloudWatch, Terraform (optional), GitHub/GitLab.
Outcome :
Junior Cloud Engineer / Cloud Associate / DevOps ready.

Overall Assessment Plan Weekly assignments :
2–3 hands on labs (e.g., deploy EC2, S3, IAM policies, autoscaling). Monthly tests: 1 hour MCQ + practical (e.g., “fix this broken VPC configuration”). Monthly projects: End to end cloud deployments (e.g., web app, serverless workflow)

Month 1 – Foundations: Python, SQL, Statistics & EDA (Weeks 1–4)

Weeks & Topics	Daily Task (4-5 hrs)	Assignments	Milestones
week 1 : Intro to data science & tools	What is data science, roles (analyst, scientist, engineer), data pipeline (collect → clean → analyze → model → deploy), Python & SQL setup, Jupyter/Colab.	1) Install Python, pandas, numpy, matplotlib, Jupyter; 2) Write a “Hello Data” notebook loading a CSV and printing basic info.	Environment setup + first notebook
week 2 : Python for data	Variables, functions, lists/dicts, list comprehensions, lambda, file I/O, working with CSV/JSON, pandas Series/DataFrame basics.	1) Clean a messy CSV (handle missing values, types); 2) Compute basic stats (mean, median, std) manually + with pandas	Cleaned dataset + stats notebook.
week 3 : SQL basics for data	Relational databases, SELECT, WHERE, GROUP BY, JOINs, subqueries, aggregations, basic window functions	1) Install SQLite / use online DB; 2) Write 10+ queries on a sample DB (sales, students, orders).	SQL query set + results.
week 4 : Statistics & EDA I	Descriptive stats (mean, median, variance, distribution), correlation, histograms, boxplots, outlier detection, Monthly Test 1 (Python + SQL + stats/EDA).	1) Perform EDA on a real dataset (e.g., Titanic, housing, sales); 2) Create 5–7 visualizations with Matplotlib/Seaborn.	Project 1: “EDA & Insights Notebook” – A comprehensive notebook: data loading → cleaning → EDA → visualizations → key insights + a short written report (1–2 pages).

Month 2 – Machine Learning Fundamentals & Modeling (Weeks 5–8)

Weeks & topics	Daily task (4-5 hrs)	Assignment	Milestones
week 5 : ML basics & supervised learning	ML types (supervised/unsupervised), train/test split, overfitting/underfitting, biasvariance, linear regression, logistic regression.	1) Implement linear/logistic regression from scratch (concept) + scikitlearn; 2) Compare models on a small dataset.	Regression/classification notebook.
week 6 : Classification & evaluation	kNN, decision trees, random forests, SVM (concept), metrics: accuracy, precision, recall, F1, ROCAUC, confusion matrix	1) Build a classifier on an imbalanced dataset; 2) Tune threshold to optimize precision/recall.	Classification report + plots
week 7 : Unsupervised learning & feature engineering	Clustering (kmeans, hierarchical), dimensionality reduction (PCA), feature scaling, encoding, handling missing data, feature selection.	1) Cluster customers into segments; 2) Apply PCA and visualize reduced dimensions.	Clustering + PCA notebook.
week 8 : Model tuning & pipelines	Hyperparameter tuning (GridSearch/RandomSearch), crossvalidation, scikitlearn pipelines, model persistence (pickle/joblib), Monthly Test 2 (ML algorithms + evaluation).	1) Tune a random forest / xgboost model; 2) Build a pipeline (preprocess → model → evaluate).	Project 2: “Predictive Modeling Project” – A complete supervised ML project (e.g., churn prediction, house price prediction, fraud detection) with: data cleaning, EDA, model building, evaluation, hyperparameter tuning, and a short businessfocused report.

Month 3 – Advanced Analytics, Big Data Basics & Capstone (Weeks 9–12)

Weeks & Topics	Daily task (4-5 hrs)	Assignments	Milestones
week 09 : Time series & forecasting	Time series components (trend, seasonality), moving average, ARIMA (concept), simple forecasting with sklearn/prophet.	1) Load a time series dataset (sales, stock, weather); 2) Build a simple forecasting model and plot forecast.	Time series notebook + forecast plot.
week 10 : Intro to big data & Spark (concept)	Big data concepts (volume, velocity, variety), Hadoop/Spark basics, PySpark DataFrame API (concept + small demos), distributed computing idea	1) Run a small PySpark job on a sample dataset; 2) Compare performance with pandas on larger data (if possible).	PySpark demo + notes.
week 11 : Data storytelling & deployment	Communicating insights, dashboards (Tableau/Power BI or Plotly Dash/Streamlit), basic model deployment (Flask/FastAPI), CI/CD concept, ethics in data science.	1) Build a simple dashboard (Streamlit/Dash) showing model predictions; 2) Create a Flask/FastAPI endpoint for a trained model (local).	Dashboard + API demo.
week 12 : Capstone & portfolio	Endtoend data science product: problem definition → data → EDA → modeling → deployment/dashboard → presentation; Final Test (MCQ + practical capstone review).	1) Polish capstone notebook + dashboard + report; 2) Prepare a 5–7 slide presentation + GitHub repo.	Project 3: “Data Science Capstone” – A complete endtoend project such as:- “Sales forecasting dashboard”- “Customer churn prediction system with dashboard”- “Fraud detection system with API” Includes data cleaning, EDA, ML model, evaluation, dashboard or API, Git repo, README, and a 2–3 page report.