Advanced PDF Analysis & Disarm Tool

Overview

This tool scans, analyzes, and optionally “disarms” PDF files. It provides:

PDF Structure Analysis: Detects keywords, calculates entropy, and identifies malicious indicators such as embedded JavaScript and launch actions.
Concurrency: Uses Python’s ThreadPoolExecutor to process multiple files in parallel.
Disarm Mode: Generates a <filename>.disarmed.pdf that strips or obfuscates dangerous elements like /JS, /JavaScript, /Launch, etc.
Directory Recursion: Gathers files from a given directory or directories, optionally recursing.
Plugin Architecture: Supports loading custom plugins for scoring or additional checks.
Selection Expressions: Allows filtering results (e.g., show only PDFs with certain suspicious counts).
Multiple Output Formats:
- Human-readable console output
- CSV format (one line per file)
- JSON export function (PDFiD2JSON) available for custom usage

Use it at your own risk.

Installation

Requirements:
- Python 3.7+ (recommended)
- Optional library: pyzipper for AES-encrypted ZIP support
- Standard libraries: argparse, concurrent.futures, urllib.request, etc. (included in most Python installs)
Clone or Download this script:
```
git clone https://github.com/exfil0/PDFdisarm.git
```
(If this is just an example—use your preferred distribution method.)
Make It Executable (Linux/Mac):
```
chmod +x pdfscan.py
```
(Optional) Install pyzipper:
```
pip install pyzipper
```

Usage

Basic Command

./pdfscan.py <file1.pdf> <file2.pdf> ...

Analyzes each file and prints detailed results to the console.

Wildcards and Directory Recursion

./pdfscan.py /path/to/pdfs -r

Recursively scans all files under /path/to/pdfs.

Disarm Mode

./pdfscan.py malicious.pdf --disarm

Creates malicious.disarmed.pdf with potentially malicious elements neutralized.

CSV Output

./pdfscan.py /path/to/pdfs -r --csv -o results.csv

Outputs a single CSV with all scan results, one row per file.
If -o is not specified, CSV goes to stdout.

Selecting Files by Condition

./pdfscan.py *.pdf --select="pdf.js.count > 0"

Only shows results for files where the JavaScript (/JS) count is greater than zero.

Plugin Usage

./pdfscan.py suspicious.pdf --plugins=MyPlugin.py --csv

Loads a custom plugin (MyPlugin.py) which can provide additional scoring or checks.

Threading

./pdfscan.py /path/to/pdfs --threads 8

Uses 8 worker threads to speed up scanning across many files.

Command-Line Options

files (positional):
- One or more file paths, directory paths, or wildcard patterns.
-r, --recursedir: Recurse into subdirectories when a directory is provided.
-o, --output: Specify output file (CSV only).
--all: Show all recognized PDF keywords (even non-standard ones).
--extra: Collect extra data such as dates and entropy.
--force: Force scanning even if the PDF header is missing.
--disarm: Write a disarmed copy of each PDF as <filename>.disarmed.pdf.
--select: Python expression to filter results, e.g. pdf.js.count>0.
--nozero: Suppress printing zero counts in console output.
--threads: Number of parallel worker threads (default=4).
--scan: Legacy option, similar to scanning a directory.
--plugins: Comma-separated list of plugin .py files to load.
--pluginoptions: Additional string to pass to plugins.
--csv: Output results to CSV (to file if -o is specified, else stdout).
--minimumscore: Only show files or plugin results that meet or exceed this numeric score.
--verbose: Print detailed tracebacks on errors.

Example Workflows

Single File Quick Scan
```
./pdfscan.py mydocument.pdf
```
Displays a detailed report (keywords, potential malicious actions) in the console.
Multiple PDFs, CSV Output
```
./pdfscan.py /opt/pdfs/*.pdf --csv -o results.csv
```
Gathers results in results.csv, easy to import into Excel.
Full Directory Disarm
```
./pdfscan.py /opt/malware-pdfs -r --disarm
```
Recursively generates *.disarmed.pdf copies.

Plugin Notes

Plugin Classes must subclass cPluginParent.
The script automatically discovers plugin classes from the loaded files.
Each plugin typically implements a Score() method returning a numeric score.

Disclaimer

Authored and maintained by Exfil0.
No warranties are provided. Use at your own risk.

Feel free to adapt and redistribute with attribution to Exfil0.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
pdfscan.py		pdfscan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced PDF Analysis & Disarm Tool

Overview

Installation

Usage

Basic Command

Wildcards and Directory Recursion

Disarm Mode

CSV Output

Selecting Files by Condition

Plugin Usage

Threading

Command-Line Options

Example Workflows

Plugin Notes

Disclaimer

About

Languages

exfil0/PDFdisarm

Folders and files

Latest commit

History

Repository files navigation

Advanced PDF Analysis & Disarm Tool

Overview

Installation

Usage

Basic Command

Wildcards and Directory Recursion

Disarm Mode

CSV Output

Selecting Files by Condition

Plugin Usage

Threading

Command-Line Options

Example Workflows

Plugin Notes

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Languages