The F1-score is the harmonic mean of precision and recall, and gives a more balanced picture:
Where:
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
Example: https://www.kaggle.com/code/mayuringle8890/fraud-detection-notebook/
π In fraud detection:
- We want high recall β catch as many frauds as possible (reduce false negatives)
- We also want high precision β avoid too many false alarms (reduce false positives)
-
A high F1-score means your model is doing well on both fronts.
-
You may even consider using FΞ²-score to prioritize either recall or precision depending on your business need:
- F2-score if catching more fraud is more important than false positives.
- F0.5-score if false positives are costlier than missing some fraud.
If 99.9% of transactions are legitimate and the model always predicts "Not Fraud", then:
- Accuracy = 99.9%,
- True Positives (fraud detected) = 0
- False Negatives (fraud missed) = All actual frauds
π So the model looks good on paper (99.9% accurate) but is completely useless in practice β it catches no fraud.
| Metric | Good for imbalanced data? | What it tells you |
|---|---|---|
| Accuracy | β No | Can be misleading if classes are imbalanced |
| Precision | β Yes | How many predicted frauds were actually frauds |
| Recall | β Yes | How many actual frauds you successfully detected |
| F1-score | β β Best choice | Balances precision and recall |
For fraud detection, which is a highly imbalanced binary classification problem, you need a model that:
- Handles class imbalance well.
- Can capture complex patterns.
- Can be tuned for precision-recall trade-offs.
Here are some recommended models, categorized by complexity:
| Model | Notes |
|---|---|
| Logistic Regression | Simple, interpretable, good baseline. Add class weights or use SMOTE. |
| Decision Tree | Captures non-linear patterns, but can overfit. |
Use with:
- class_weight='balanced'
- feature scaling (for LR)
| Model | Why it's good for fraud detection |
|---|---|
| Random Forest | Robust, handles imbalance with class weights. |
| XGBoost | Handles imbalance via scale_pos_weight, high performance. |
| LightGBM | Fast, efficient, supports is_unbalance=True flag. |
| CatBoost | Works well with categorical features and imbalance. |
β These are often top performers in Kaggle competitions and real-world systems.
If you have very few fraud samples, try:
| Model | Notes |
|---|---|
| Isolation Forest | Unsupervised, good for detecting rare patterns |
| One-Class SVM | Works when you only have "normal" data to learn from |
| Autoencoders (Deep Learning) | Learn normal patterns, flag large reconstruction errors as frauds |
π Use these when you donβt have labels for frauds or they are very sparse.
| Model | Notes |
|---|---|
| Graph Neural Networks | If fraud involves networks (users, devices, accounts) |
| Hybrid Models (Ensemble + Deep Learning) | Combine decision trees and autoencoders |
- Resampling: SMOTE, ADASYN, or undersample the majority class.
- Evaluation: Use F1-score, Precision-Recall AUC, not accuracy.
- Threshold tuning: You can tune the classification threshold to optimize F1 or minimize business cost.
- Explainability: Use SHAP/LIME for model interpretability, especially important in finance.
# Try this pipeline:
- Preprocessing: Scale features + encode categoricals
- Use: LightGBM or XGBoost
- Set: scale_pos_weight = (legit / fraud) ratio
- Evaluate: precision, recall, F1, PR-AUCExample: https://www.kaggle.com/code/mayuringle8890/fraud-detection-notebook/
In a real fraud detection system, you might also:
- Use Precision-Recall curves
- Optimize based on business cost (e.g., cost of a false positive vs false negative)
- Use confusion matrix to interpret model performance