## https://sploitus.com/exploit?id=CC76416D-BF4E-5630-A783-47B407A3E5B5
# WAF Attack Detection โ Machine Learning Classifier
A machine learning-based Web Application Firewall (WAF) that classifies HTTP payloads into attack categories such as **SQLi, XSS, SSTI, LFI, RCE/Shell, NoSQL Injection, and CRLF Injection**, with a custom web UI for testing payloads in real time.
## Overview
This project uses a character-level TF-IDF vectorizer combined with a Logistic Regression classifier to detect malicious payloads and classify them by attack type. The model is trained on a merged dataset combining multiple attack-type sources, cleaned and deduplicated for consistency.
## Features
- Multi-class classification of common web attack payloads
- Custom-built web interface (UI) for submitting and testing payloads
- Merged and cleaned dataset pipeline (EDA + preprocessing scripts included)
- Trained with scikit-learn (`TfidfVectorizer` + `LogisticRegression`)
- Evaluation via accuracy, precision, recall, F1-score, and confusion matrix
## Attribution
This project is based on and adapted from an existing WAF payload-classification approach (model architecture: character n-gram TF-IDF + Logistic Regression pipeline). We modified the training pipeline, retrained the model on our own merged/cleaned dataset, and built a completely custom UI from scratch.
If you are the original author of the base notebook/approach this project was adapted from and would like explicit credit/linking here, please open an issue or contact us โ we're happy to add a direct reference.
## Tech Stack
- Python, pandas, scikit-learn
- matplotlib / seaborn (EDA & visualization)
## Project Structure
```
โโโ waf.csv # merged, cleaned dataset
โโโ EDA.py # merges individual attack datasets into waf.csv
โโโ edit.py # label normalization / cleanup utilities
โโโ show_data.py # dataset distribution inspection
โโโ train.py # model training script
โโโ ui/ # custom web interface
```
## How It Works
1. Individual attack-type datasets (SQLi, XSS, SSTI, LFI, Shell, NoSQL, CRLF) are merged and cleaned (`EDA.py`).
2. Labels are normalized (`edit.py`).
3. A TF-IDF + Logistic Regression pipeline is trained on the merged dataset (`train.py`).
4. The trained model is saved (`waf_model.sav`) and served through the custom UI for real-time payload classification.
## Disclaimer
This project is intended for **educational and defensive security research purposes only**. Do not use it to attack systems you do not have explicit permission to test.
## License
This project is licensed under the MIT License โ see the [LICENSE](LICENSE) file for details.