Three technical projects that shaped my journey as a data engineer and analyst — by Anupam P Menon
I designed and deployed a fully serverless, end-to-end data pipeline on AWS to ingest real-time electricity production data from Denmark's national grid operator, Energinet, at 5-minute intervals — processing it through a multi-layer data lake and surfacing insights in Power BI.
What I Built — Step by Step
Designed the full pipeline architecture selecting each AWS service based on cost and scalability — the entire project ran for $4.40 total.
Wrote two Python Lambda functions using boto3: one to call the Energinet API every 5 minutes and store raw JSON to S3, and an Orchestrator to trigger daily Glue ETL jobs.
Configured an EventBridge cron rule (00 15 * * ? *) for daily scheduling, and built a 3-layer S3 data lake: Raw → Processed → Wrangled using AWS Glue ETL jobs.
Set up Glue Crawler + Data Catalog to auto-register schema, enabling SQL queries via Amazon Athena, then connected Power BI via ODBC for live dashboards.
Implemented 3 CloudWatch alarms (Ingestion, Orchestrator, GlueJob) with SNS email alerts — I personally received the alarm emails, proving the monitoring works end-to-end.
AWS Cost Breakdown (Total: $4.40)
Estimated cost breakdown by AWS service

Full AWS Architecture Diagram — designed by Anupam

Alteryx modeling workflow — built by Anupam
Model Recall Comparison (Attrition Class)
Decision Tree recall (54.6%) nearly doubles Logistic Regression (29.7%)
I built a complete machine learning pipeline to predict employee attrition using a 5,000-record HR dataset with 27 features — from raw data profiling through EDA, feature engineering, model training, and business-driven model selection.
What I Did — Step by Step
Loaded and profiled a 5,000-record HR dataset with 27 features in Alteryx — identified and imputed 12 missing MonthlyIncome values (0.24%) to ensure clean training data.
Wrote Python EDA scripts using matplotlib and seaborn to visualise attrition distribution, income vs attrition boxplots, and age distribution KDE plots — revealing younger employees (22–30) had the highest attrition risk.
Applied one-hot encoding for 5 categorical variables (Gender, MaritalStatus, Department, JobRole, EducationLevel) and normalised numeric features before modelling.
Used Alteryx's Data Sampling tool to split 70/30 (3,500 training / 1,500 test) and built 3 models: Logistic Regression, Decision Tree, and Boosted Model.
Selected Decision Tree as the final model — not because of accuracy (all ~77.8%), but because its attrition recall of 54.6% was nearly double Logistic Regression's 29.7%, which matters most in HR risk detection.
Attrition Rate by Department (%)
Sales had the highest attrition rate at 20.6%
In a cross-disciplinary collaboration between Data Science and Pharmaceutics students, I analysed Maxeo tablet manufacturing data across 32 production batches to identify the root cause of British Pharmacopoeia compliance failures — and communicated findings through a structured data story.
What I Did — Step by Step
Collaborated in a cross-disciplinary team to analyse Maxeo tablet data across 32 production batches, targeting British Pharmacopoeia (BP) compliance for Uniformity of Mass.
Structured the entire analysis as a 5-act data story (Introduction → Rising Action → Climax → Falling Action → Conclusion) to make complex QC findings accessible to non-technical QA personnel.
Built a 4-panel Tableau dashboard showing Avg Weight, Avg Height, Temperature trend, and Humidity trend across all 32 batches — revealing high weight fluctuation (62.1–64.8mg, mean 63.5mg).
Created scatter plots to test whether temperature (24.5°C→25.8°C) and humidity (62.2%→59.7%) caused failures — and ruled out environmental factors as the primary cause.
Concluded that process-level variability (compression force inconsistency, die fill variation) was the root cause, and recommended tighter compression parameter control and enhanced monitoring.
Simulated Avg Weight by Batch (mg) — 32 Batches
High fluctuation indicates tablet press struggling to maintain uniform fill

4-panel Tableau dashboard — 32-batch analysis by Anupam's team
Each project reflects a different dimension of how I approach data — from infrastructure to modelling to communication. Together, they represent how I think about building solutions that are end-to-end, business-aware, and collaborative.