OCBC Ignite Programme · 2026

Building with Data

Three technical projects that shaped my journey as a data engineer and analyst — by Anupam P Menon

01AWS Serverless PipelineCloud Engineering

02Employee Attrition MLMachine Learning

03Pharma Quality AnalyticsData Storytelling

01Cloud Engineering · AWS · Python

AWS Serverless
Energy Data Pipeline

I designed and deployed a fully serverless, end-to-end data pipeline on AWS to ingest real-time electricity production data from Denmark's national grid operator, Energinet, at 5-minute intervals — processing it through a multi-layer data lake and surfacing insights in Power BI.

What I Built — Step by Step

Designed the full pipeline architecture selecting each AWS service based on cost and scalability — the entire project ran for $4.40 total.

Wrote two Python Lambda functions using boto3: one to call the Energinet API every 5 minutes and store raw JSON to S3, and an Orchestrator to trigger daily Glue ETL jobs.

Configured an EventBridge cron rule (00 15 * * ? *) for daily scheduling, and built a 3-layer S3 data lake: Raw → Processed → Wrangled using AWS Glue ETL jobs.

Set up Glue Crawler + Data Catalog to auto-register schema, enabling SQL queries via Amazon Athena, then connected Power BI via ODBC for live dashboards.

Implemented 3 CloudWatch alarms (Ingestion, Orchestrator, GlueJob) with SNS email alerts — I personally received the alarm emails, proving the monitoring works end-to-end.

AWS Cost Breakdown (Total: $4.40)

Estimated cost breakdown by AWS service

Full AWS Architecture Diagram — designed by Anupam

$4.40

Total Cost

10+

AWS Services

5 min

Data Interval

02Machine Learning · Python · Alteryx

Alteryx modeling workflow — built by Anupam

Model Recall Comparison (Attrition Class)

Recall %
Accuracy %

Decision Tree recall (54.6%) nearly doubles Logistic Regression (29.7%)

Predicting Employee
Attrition with ML

I built a complete machine learning pipeline to predict employee attrition using a 5,000-record HR dataset with 27 features — from raw data profiling through EDA, feature engineering, model training, and business-driven model selection.

What I Did — Step by Step

Loaded and profiled a 5,000-record HR dataset with 27 features in Alteryx — identified and imputed 12 missing MonthlyIncome values (0.24%) to ensure clean training data.

Wrote Python EDA scripts using matplotlib and seaborn to visualise attrition distribution, income vs attrition boxplots, and age distribution KDE plots — revealing younger employees (22–30) had the highest attrition risk.

Applied one-hot encoding for 5 categorical variables (Gender, MaritalStatus, Department, JobRole, EducationLevel) and normalised numeric features before modelling.

Used Alteryx's Data Sampling tool to split 70/30 (3,500 training / 1,500 test) and built 3 models: Logistic Regression, Decision Tree, and Boosted Model.

Selected Decision Tree as the final model — not because of accuracy (all ~77.8%), but because its attrition recall of 54.6% was nearly double Logistic Regression's 29.7%, which matters most in HR risk detection.

Attrition Rate by Department (%)

Sales had the highest attrition rate at 20.6%

03Data Storytelling · Tableau · Interdisciplinary

Pharmaceutical Tablet
Quality Analytics

In a cross-disciplinary collaboration between Data Science and Pharmaceutics students, I analysed Maxeo tablet manufacturing data across 32 production batches to identify the root cause of British Pharmacopoeia compliance failures — and communicated findings through a structured data story.

What I Did — Step by Step

Collaborated in a cross-disciplinary team to analyse Maxeo tablet data across 32 production batches, targeting British Pharmacopoeia (BP) compliance for Uniformity of Mass.

Structured the entire analysis as a 5-act data story (Introduction → Rising Action → Climax → Falling Action → Conclusion) to make complex QC findings accessible to non-technical QA personnel.

Built a 4-panel Tableau dashboard showing Avg Weight, Avg Height, Temperature trend, and Humidity trend across all 32 batches — revealing high weight fluctuation (62.1–64.8mg, mean 63.5mg).

Created scatter plots to test whether temperature (24.5°C→25.8°C) and humidity (62.2%→59.7%) caused failures — and ruled out environmental factors as the primary cause.

Concluded that process-level variability (compression force inconsistency, die fill variation) was the root cause, and recommended tighter compression parameter control and enhanced monitoring.

Simulated Avg Weight by Batch (mg) — 32 Batches

High fluctuation indicates tablet press struggling to maintain uniform fill

4-panel Tableau dashboard — 32-batch analysis by Anupam's team

Batches Analysed

62.1–64.8mg

Weight Range

+1.3°C

Temp Drift

Process

Root Cause

Three Projects,
Three Strengths

Each project reflects a different dimension of how I approach data — from infrastructure to modelling to communication. Together, they represent how I think about building solutions that are end-to-end, business-aware, and collaborative.

AWS PipelineEnd-to-End ThinkingFrom API ingestion to Power BI — I own the full stack

Attrition MLBusiness MindsetChose recall over accuracy because HR needs to catch leavers

Pharma AnalyticsCollaborative CommunicationTurned complex QC data into a 5-act story for QA teams

Building with Data

AWS ServerlessEnergy Data Pipeline

Predicting EmployeeAttrition with ML

Pharmaceutical TabletQuality Analytics

Three Projects,Three Strengths

AWS Serverless
Energy Data Pipeline

Predicting Employee
Attrition with ML

Pharmaceutical Tablet
Quality Analytics

Three Projects,
Three Strengths