Share
## https://sploitus.com/exploit?id=CC76416D-BF4E-5630-A783-47B407A3E5B5
# WAF Attack Detection โ€” Machine Learning Classifier

A machine learning-based Web Application Firewall (WAF) that classifies HTTP payloads into attack categories such as **SQLi, XSS, SSTI, LFI, RCE/Shell, NoSQL Injection, and CRLF Injection**, with a custom web UI for testing payloads in real time.

## Overview

This project uses a character-level TF-IDF vectorizer combined with a Logistic Regression classifier to detect malicious payloads and classify them by attack type. The model is trained on a merged dataset combining multiple attack-type sources, cleaned and deduplicated for consistency.

## Features

- Multi-class classification of common web attack payloads
- Custom-built web interface (UI) for submitting and testing payloads
- Merged and cleaned dataset pipeline (EDA + preprocessing scripts included)
- Trained with scikit-learn (`TfidfVectorizer` + `LogisticRegression`)
- Evaluation via accuracy, precision, recall, F1-score, and confusion matrix

## Attribution

This project is based on and adapted from an existing WAF payload-classification approach (model architecture: character n-gram TF-IDF + Logistic Regression pipeline). We modified the training pipeline, retrained the model on our own merged/cleaned dataset, and built a completely custom UI from scratch.

If you are the original author of the base notebook/approach this project was adapted from and would like explicit credit/linking here, please open an issue or contact us โ€” we're happy to add a direct reference.

## Tech Stack

- Python, pandas, scikit-learn
- matplotlib / seaborn (EDA & visualization)

## Project Structure

```
โ”œโ”€โ”€ waf.csv                # merged, cleaned dataset
โ”œโ”€โ”€ EDA.py                 # merges individual attack datasets into waf.csv
โ”œโ”€โ”€ edit.py                # label normalization / cleanup utilities
โ”œโ”€โ”€ show_data.py           # dataset distribution inspection
โ”œโ”€โ”€ train.py                # model training script
โ””โ”€โ”€ ui/                     # custom web interface
```

## How It Works

1. Individual attack-type datasets (SQLi, XSS, SSTI, LFI, Shell, NoSQL, CRLF) are merged and cleaned (`EDA.py`).
2. Labels are normalized (`edit.py`).
3. A TF-IDF + Logistic Regression pipeline is trained on the merged dataset (`train.py`).
4. The trained model is saved (`waf_model.sav`) and served through the custom UI for real-time payload classification.

## Disclaimer

This project is intended for **educational and defensive security research purposes only**. Do not use it to attack systems you do not have explicit permission to test.

## License

This project is licensed under the MIT License โ€” see the [LICENSE](LICENSE) file for details.