Credit Card Fraud Detection
ML pipeline for fraud detection with WOE feature engineering and SMOTE on 0.17% fraud rate dataset
Personal
Problem
Problem
Credit card fraud detection is a textbook class imbalance problem — only 0.172% of transactions are fraudulent. Standard classifiers trained on raw features learn to predict "not fraud" for everything and achieve 99.8% accuracy while being completely useless. The project builds a pipeline that handles the imbalance structurally, engineers meaningful features via Weight of Evidence (WOE), and produces a model that is actually calibrated for the rare-event detection task.
Approach
Approach
Weight of Evidence (WOE) encoding converts raw features into monotonically scaled, model-interpretable bins. SMOTE (Synthetic Minority Oversampling Technique) generates synthetic minority samples to rebalance the training set without discarding majority samples. Multiple classifiers are trained and evaluated on precision/recall/AUC-ROC rather than accuracy — the only metrics that matter for imbalanced fraud detection.
Architecture
Architecture
Credit Card Fraud Detection — system diagram
Key Technical Decisions
Key Technical Decisions
WOE encoding over raw features
Raw numeric features have no monotonic relationship with fraud probability. WOE encoding bins each feature and replaces it with ln(Events/Non-Events) — a transformation that is monotonically related to the log-odds of fraud, which is exactly what logistic regression needs. It also naturally handles missing values and outliers.
SMOTE over class weighting
Class weighting adjusts the loss function but doesn't change what the model sees. SMOTE generates synthetic minority samples in feature space, giving the model more decision boundary information in the fraud region. For a 0.172% fraud rate, the difference in minority boundary coverage is significant.
Results
Results
- ✓Structured pipeline handling 0.172% fraud rate with WOE + SMOTE
- ✓Evaluation on precision, recall, and AUC-ROC — not misleading accuracy
- ✓WOE encoding provides interpretable, monotonically calibrated feature representations
Tech Stack
Tech Stack
Links
Links