Skip to content
/ 2pac Public

Find and eliminate corrupt image files with visual detection. In memory of Jeff Young.

Notifications You must be signed in to change notification settings

ricyoung/2pac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”« 2PAC: The Picture Analyzer & Corruption killer

Version Python License Colorful

2PAC Coding

All Eyez On Your Images: A lightning-fast tool to find and whack corrupt image files from your photo collection.

"I ain't a killer but don't push me. Corrupt images got their days numbered."

Created by Richard Young

View official logo and usage guidelines

πŸš€ What's New in v1.5.0

πŸ‘οΈ Visual Corruption Detection
New command-line option --check-visual analyzes image content to detect visually corrupt files with large gray/black areas
πŸ” Adjustable Detection Strictness
New --visual-strictness {low,medium,high} option lets you control how aggressive the visual corruption detection should be
🧠 Smart Detection Algorithm
Intelligently distinguishes between corruption and legitimate solid-colored areas like white backgrounds
πŸ”§ Combined Detection Modes
Can be used with --ignore-eof to find only visually corrupt files while ignoring technical EOF issues

Skip to the detailed Visual Content Analysis section

πŸš€ What's New in v1.4.0

🎚️ Adjustable Validation Sensitivity
New command-line option --sensitivity {low,medium,high} lets you control the strictness of image validation to match your needs
🧩 Smart EOF Handling
New --ignore-eof option allows keeping files that are technically corrupt (missing proper end markers) but still viewable in most applications
πŸ“ Enhanced Format Structure Validation
Deep JPEG and PNG structure analysis finds corruption that basic validation misses
⚑ Performance Optimizations
Smarter validation path selection based on sensitivity level improves scanning speed

Skip to the detailed New Validation System section

✨ Features

  • Supports Multiple Image Formats:
    • πŸ“Έ JPEG (.jpg, .jpeg, .jfif, etc.)
    • 🎨 PNG (.png)
    • πŸ“„ TIFF (.tiff, .tif)
    • 🎭 GIF (.gif)
    • πŸ–ΌοΈ BMP (.bmp)
    • 🌐 WebP (.webp)
    • πŸ“± HEIC (.heic)
  • High Performance: Parallel processing to handle thousands of images efficiently
  • Advanced Validation Technology:
    • 🧐 Checks both image headers and data to identify corruption
    • πŸ‘οΈ NEW: Visual corruption detection to find files with gray/black areas
    • 🎚️ Adjustable sensitivity levels to balance speed vs thoroughness:
      • Low: Basic header checks for quick scans (fastest)
      • Medium: Standard validation for most use cases (default)
      • High: Deep structure analysis to catch subtle corruption (most thorough)
    • πŸ” NEW: Visual strictness levels to control how aggressively to detect visible corruption:
      • Low: Only the most obvious visual corruption (minimal false positives)
      • Medium: Balanced visual detection (default)
      • High: Catches more subtle visual corruption (may have false positives)
    • πŸ“ Format-specific structure validation:
      • JPEG: Verifies marker sequence, EOI presence, segment structure
      • PNG: Validates chunks, CRC checksums, IHAT compression integrity
    • 🧩 Smart EOF handling with --ignore-eof option for files that are technically corrupt (missing proper end markers) but still viewable in most applications
  • Multiple Operation Modes:
    • πŸ” Dry Run - Preview corrupt files with no changes (default)
    • πŸ—‘οΈ Delete - Permanently remove corrupt files
    • πŸ“¦ Move - Relocate corrupt files to a separate directory
    • πŸ”§ Repair - Attempt to fix corrupted images
    • ⏸️ Resume - Continue interrupted scans from where they left off
  • Security Tools:
    • πŸ•΅οΈ RAT Finder - Detects hidden data (steganography) in images
    • πŸ” Multiple steganography detection methods including LSB, ELA and histogram analysis
    • πŸ“Š Visual reporting for easy analysis of suspicious images
  • Beautiful Interface:
    • 🌈 Colorful Output - Color-coded progress bars and logs
    • πŸ“Š Visual Progress - Real-time progress tracking with ETA
    • πŸ“ˆ Rich Reporting - Space savings and processing metrics
  • Flexible Configuration:
    • Control recursion depth
    • Adjust worker count
    • Filter by image format
    • Save reports for later review
    • Preserve directory structure when moving files

πŸ“‹ Requirements

  • Python 3.6+
  • Required packages:
    • Pillow - Python Imaging Library
    • tqdm - Progress bar
    • humanize - Human-readable metrics
    • colorama - Cross-platform colored terminal output
    • numpy - Numerical operations (required for RAT Finder)
    • scipy - Scientific computing (required for RAT Finder)
    • matplotlib - Data visualization (required for RAT Finder)

πŸš€ Installation

# Clone the repository
git clone https://github.com/ricyoung/2pac.git
cd 2pac

# Install dependencies
pip install -r requirements.txt

# Make executable (Unix/macOS)
chmod +x find_bad_images.py

"They got money for wars, but can't feed the poor." - But we got tools for your images.

🧰 Usage

Basic (Safe) Mode

./find_bad_images.py /path/to/images

This performs a dry run, showing which files would be deleted without making changes.

Quick Exit

./find_bad_images.py q

Quickly exit the program. This works for both find_bad_images.py and rat_finder.py.

Delete Mode

./find_bad_images.py /path/to/images --delete

⚠️ Warning: This permanently deletes corrupt image files!

Move Mode

./find_bad_images.py /path/to/images --move-to /path/to/corrupt_folder

Safely relocates corrupt files to a separate directory for review instead of deleting them. Use this as an alternative to --delete when you want to examine corrupt files before permanently removing them.

The directory structure from the original location is preserved in the destination folder, making it easier to understand where files came from and preventing filename collisions.

Filter By Format

# Check only JPEG files
./find_bad_images.py /path/to/images --jpeg

# Check only PNG files
./find_bad_images.py /path/to/images --png

# Check specific formats
./find_bad_images.py /path/to/images --formats JPEG PNG TIFF

Repair Mode

# Attempt to repair corrupt images (creates backups first)
./find_bad_images.py /path/to/images --repair --backup-dir /path/to/backups

# Repair and save a report of fixed files
./find_bad_images.py /path/to/images --repair --repair-report repaired_files.txt

# Repair and move files that couldn't be repaired
./find_bad_images.py /path/to/images --repair --backup-dir /path/to/backups --move-to /path/to/still_corrupt

Important Notes:

  • --backup-dir is used with --repair to save original versions of files before attempting repairs
  • --move-to is used to relocate corrupt files that were found (or couldn't be repaired) to another location
  • These options serve different purposes: one preserves originals before repair, the other handles corrupt files

Progress Saving and Resuming

# List all saved sessions
./find_bad_images.py --list-sessions

# Resume a previously interrupted session
./find_bad_images.py --resume abc123def456

# Customize progress saving interval (default: 5 minutes)
./find_bad_images.py /path/to/images --save-interval 10

# Disable progress saving
./find_bad_images.py /path/to/images --save-interval 0

All Options

usage: find_bad_images.py [-h] [--list-sessions] [--delete] [--move-to MOVE_TO]
                          [--workers WORKERS] [--non-recursive] [--output OUTPUT]
                          [--verbose] [--no-color] [--version] [--repair]
                          [--backup-dir BACKUP_DIR] [--repair-report REPAIR_REPORT]
                          [--formats {JPEG,PNG,GIF,TIFF,BMP,WEBP,ICO,HEIC} [...]]
                          [--jpeg] [--png] [--tiff] [--gif] [--bmp]
                          [--save-interval SAVE_INTERVAL] [--progress-dir PROGRESS_DIR]
                          [--resume SESSION_ID] [--sensitivity {low,medium,high}]
                          [--ignore-eof] [--check-visual]
                          [--visual-strictness {low,medium,high}]
                          [directory]

positional arguments:
  directory         Directory to search for image files

optional arguments:
  -h, --help        Show this help message and exit
  --list-sessions   List all saved sessions
  --delete          Delete corrupt image files (without this flag, runs in dry-run mode)
  --move-to MOVE_TO Move corrupt files to this directory instead of deleting them
  --workers WORKERS Number of worker processes (default: CPU count)
  --non-recursive   Only search in the specified directory, not subdirectories
  --output OUTPUT   Save list of corrupt files to this file
  --verbose, -v     Enable verbose logging
  --no-color        Disable colored output (useful for logs or non-interactive terminals)
  --version         Show program's version number and exit

Repair options:
  --repair          Attempt to repair corrupt image files
  --backup-dir BACKUP_DIR
                    Directory to store backups of files before repair
  --repair-report REPAIR_REPORT
                    Save list of repaired files to this file

Image format options:
  --formats {JPEG,PNG,GIF,TIFF,BMP,WEBP,ICO,HEIC} [...]
                    Image formats to check (default: all formats)
  --jpeg            Check JPEG files only
  --png             Check PNG files only
  --tiff            Check TIFF files only
  --gif             Check GIF files only
  --bmp             Check BMP files only

Validation options:
  --sensitivity {low,medium,high}
                    Set validation sensitivity level: low (basic checks), 
                    medium (standard checks), high (most strict) (default: medium)
  --ignore-eof      Ignore missing end-of-file markers (useful for truncated but viewable files)
  --check-visual    Analyze image content to detect visible corruption like gray/black areas
  --visual-strictness {low,medium,high}
                    Set strictness level for visual corruption detection (default: medium)

Progress options:
  --save-interval SAVE_INTERVAL
                    Save progress every N minutes (0 to disable progress saving, default: 5)
  --progress-dir PROGRESS_DIR
                    Directory to store progress files
  --resume SESSION_ID
                    Resume from a previously saved session

πŸ” How It Works

2PAC Workflow

"I see no changes, wake up in the morning and I ask myself, is my image collection worth cleanin'? I don't know."

2PAC uses a sophisticated multi-step approach to handle corrupt image files:

πŸ”Ž Validation Process

πŸ§ͺ Header Verification
Examines file headers to ensure they match proper image format specifications
πŸ”¬ Data Validation
Attempts full data loading to detect issues beyond headers
πŸ“Š Error Classification
Categorizes corruption issues for optimal repair strategy selection
🎚️ Sensitivity Levels
  • Low: Basic checks only (headers and minimal data verification)
  • Medium: Standard validation (balanced between speed and thoroughness)
  • High: Most strict validation (deep format-specific structure checks)
🧩 Format-Specific Validation
  • JPEG: Verifies markers, EOI (End Of Image) presence, proper structure
  • PNG: Validates chunks, CRC checksums, IDAT structure
  • Other formats: Format-appropriate validation techniques

This multi-layered approach catches a wide range of common image corruption problems:

  • Truncated downloads
  • Partially written files
  • Damaged headers
  • Internal data corruption
  • Invalid encoding
  • Missing end markers
  • Incorrect format structure
  • Checksum failures

πŸ”§ Repair Process

When repair mode is enabled, the tool intelligently attempts to rescue damaged files:

πŸ” Smart Diagnosis
Identifies the specific type and location of corruption
πŸ’Ύ Safe Backup
Creates a backup of the original file before attempting repairs
πŸ› οΈ Format-Specific Repair
Applies specialized techniques based on file format:
  • JPEG: Handles truncation, enables partial loading, optimizes compression
  • PNG: Attempts chunk repair, rebuilds critical sections
  • GIF: Fixes frame data, repairs header structures
βœ… Validation Check
Verifies the repaired file is now properly loadable

⏱️ Progress Saving System

For large collections, an intelligent progress tracking system prevents wasted work:

🏷️ Unique Session IDs
Generates cryptographic hashes based on scan parameters for reliable session tracking
⏰ Automatic Checkpoints
Saves progress at regular intervals with minimal performance impact
πŸ›‘ Interrupt Protection
Detects Ctrl+C and other interruptions, gracefully saves state before exit
⏯️ Smart Resumption
Continues processing exactly where it left off, skipping already processed files
πŸ“‹ Session Management
Easy-to-use commands for listing, inspecting, and resuming past sessions
Supported Repair Formats: JPEG, PNG, GIF

πŸ“Š Performance

  • Processing Speed: ~1000 images per minute on a modern quad-core CPU
  • Memory Usage: Minimal (~50MB base + ~2MB per worker)
  • CPU Usage: Scales efficiently with available cores

πŸ“‹ Examples

Check a large photo library and save report

./find_bad_images.py /Volumes/Photos --output corrupt_photos.txt --verbose

Process a NAS archive with limited CPU impact

./find_bad_images.py /mnt/nas/archive --workers 2

Quick check of recent imports

./find_bad_images.py ~/Pictures/imports --non-recursive

Clean up and reclaim space immediately

./find_bad_images.py /Volumes/ExternalDrive --delete --verbose

Disable colorful output for log files

./find_bad_images.py /Volumes/Photos --output corrupt_photos.txt --no-color > logfile.txt

Check RAW images and JPEG files

./find_bad_images.py /Volumes/Photos --formats JPEG TIFF

Repair corrupted images from a camera memory card

./find_bad_images.py /Volumes/MEMORY_CARD --repair --backup-dir ~/Desktop/image_backups --verbose

Process a huge image collection with resumable progress

# Start processing a large image collection
./find_bad_images.py /Volumes/BigStorage --save-interval 10

# If interrupted, list available sessions
./find_bad_images.py --list-sessions

# Resume from where you left off
./find_bad_images.py --resume abc123def456

Customize validation strictness

# Use high sensitivity to catch even minor corruption issues
./find_bad_images.py /Volumes/Photos --sensitivity high

# Use low sensitivity for a quick basic check
./find_bad_images.py /Volumes/Photos --sensitivity low

# Keep truncated but viewable files
./find_bad_images.py /Volumes/Photos --ignore-eof

# Combine options for specific use cases
./find_bad_images.py /Volumes/Photos --sensitivity high --ignore-eof --verbose

Cross-device operations

# Move corrupt files from an external drive to a local folder while preserving structure
./find_bad_images.py /Volumes/ExternalDrive --move-to ~/Desktop/corrupted
# Result: Files like '/Volumes/ExternalDrive/folder1/subfolder/image.jpg' will be moved to '~/Desktop/corrupted/folder1/subfolder/image.jpg'

Visual corruption detection

# Find images with visible corruption (gray/black areas)
./find_bad_images.py /Volumes/Photos --check-visual --move-to ~/Desktop/visibly_corrupt

# Ignore technical issues and only find visual corruption
./find_bad_images.py /Volumes/Photos --check-visual --ignore-eof --move-to ~/Desktop/visibly_corrupt

# Use a more conservative detection (fewer false positives)
./find_bad_images.py /Volumes/Photos --check-visual --visual-strictness low

# Use a stricter detection (catches more corruption but may have false positives)
./find_bad_images.py /Volumes/Photos --check-visual --visual-strictness high

πŸ‘οΈ Visual Content Analysis

The latest version introduces a powerful new visual corruption detection system that can find files with actual visible corruption, even if they pass technical validation checks:

1. Types of Visual Corruption Detected

Type Sample Description
Gray Block Large areas of uniform gray color that replace image content
Black Block Sections of solid black that indicate missing or corrupted data
Partial Image Bottom or top sections of the image replaced with solid colors
Normal Image For comparison: a normal, uncorrupted image

2. Visual Strictness Levels

Level Description Use Case
Low
  • Only detects very obvious corruption
  • Requires 30%+ of image to be uniform gray/black
  • Minimal false positives
  • When you only want to find severely corrupted images
  • For photos with lots of legitimate white/black areas
  • Most conservative detection
Medium
  • Balanced visual corruption detection
  • Requires 20%+ of image to be uniform gray/black
  • Good balance between detection and false positives
  • Default for most use cases
  • Regular photo library maintenance
  • When you want to catch most visual corruption
High
  • Most sensitive detection
  • Requires only 15%+ of image to be uniform gray/black
  • Also checks for unusual color distribution
  • May have some false positives
  • When finding all corruption is critical
  • For photos that must be perfect
  • When reviewing results is not a problem

3. Smart Detection Features

The visual corruption detection algorithm includes several smart features:

  • Color Context Awareness: Distinguishes between corruption and legitimate white/black areas based on color context
  • Sampling Technique: Uses intelligent sampling to efficiently analyze even large images
  • Grayscale Detection: Specifically targets mid-tone grays that are common in corruption but rare in natural photos
  • White Area Handling: Special handling for white areas, which are often legitimate in photos (sky, backgrounds, etc.)

4. How Visual Detection Works

  1. Image Sampling: Takes a representative sample of pixels across the image
  2. Color Analysis: Identifies uniform color regions and calculates their percentage
  3. Color Context: Analyzes if the colors are likely corruption (mid-gray, black) or natural (white, gradient)
  4. Threshold Comparison: Compares against strictness thresholds to determine if corruption is present

5. Visual vs Technical Corruption

Scenario Technical Tests Visual Analysis Result
Correctly structured file with gray blocks βœ… Passes ❌ Fails Detected by --check-visual only
Missing EOF but visually perfect ❌ Fails βœ… Passes Caught by normal checks, bypassed with --ignore-eof
Severely corrupt file ❌ Fails ❌ Fails Detected by both methods
Perfect file βœ… Passes βœ… Passes Passes all checks

6. Command-Line Examples

# Find visibly corrupt files with medium strictness (default)
find_bad_images.py /path/to/photos --check-visual --move-to /path/for/corrupted

# Very strict detection - catches all visual corruption but may have false positives
find_bad_images.py /path/to/photos --check-visual --visual-strictness high

# Conservative detection - only flagging obvious corruption
find_bad_images.py /path/to/photos --check-visual --visual-strictness low

# Find only visually corrupt files but ignore technical EOF issues
find_bad_images.py /path/to/photos --check-visual --ignore-eof

7. Combining With Other Features

Visual corruption detection works seamlessly with other features:

  • Use with --ignore-eof to find only visibly corrupt files while ignoring technical issues
  • Use with --repair to attempt to fix files that have both visual and technical corruption
  • Use with --move-to to collect all visually corrupt files in a separate directory for review

πŸ•΅οΈ RAT Finder: Steganography Detection Tool

"What the eyes see and the ears hear, the mind believes."

The 2PAC toolkit now includes a powerful steganography detection tool called RAT Finder that helps you identify images containing hidden data. While find_bad_images.py focuses on corruption detection, RAT Finder specializes in security analysis.

Key Capabilities

πŸ” Multiple Detection Methods
Combines six different analysis techniques to detect various steganography approaches, including LSB, DCT manipulation, and metadata hiding
πŸ”„ Error Level Analysis (ELA)
Advanced technique that recompresses images and analyzes error patterns to detect manipulated areas
πŸ“Š Rich Visual Reports
Generates comprehensive visual reports with 9 different analysis panels to help interpret results
βš™οΈ Adjustable Sensitivity
Control detection thresholds to balance between false positives and false negatives

Usage

# Scan a directory for steganography
./rat_finder.py /path/to/images

# Check a specific suspicious file
./rat_finder.py --check-file suspicious.jpg --visual-reports ./reports

Learn more about RAT Finder and steganography detection β†’

🎚️ New Validation System

The v1.4.0 version introduced a powerful validation system with improved control and detection capabilities:

1. Sensitivity Levels

Level Description Use Case
Low
  • Basic header verification only
  • Minimal data loading checks
  • Fast but less thorough
  • Quick initial scan of large collections
  • When looking for only severely corrupted files
  • Maximum performance needed
Medium
  • Standard header and data validation
  • Balanced between speed and detection
  • Catches most common corruption issues
  • Default for most use cases
  • Regular maintenance scans
  • Good balance of speed and thoroughness
High
  • Deep structure analysis
  • Format-specific validation
  • Checks internal consistency
  • Most thorough but slower
  • Archive integrity verification
  • When preparing critical collections
  • Finding subtle corruption issues

2. EOF Marker Handling

The --ignore-eof option addresses a common issue with images that are technically corrupt but still usable:

  • What it does: Ignores missing End-Of-File/Image markers during validation
  • When to use it: For files that open properly in most viewers but fail strict validation
  • Technical detail: Many images with truncated data or missing EOI markers can still be displayed correctly by applications that are tolerant of these issues
  • Example scenario: Images downloaded from the web, processed by certain applications, or transferred with incomplete writes

3. Enhanced Format-Specific Validation

The tool includes deep structure validation for common formats:

JPEG Validation:

  • Validates marker sequence (SOI, APP, COM, SOF, etc.)
  • Checks for proper EOI marker presence
  • Validates segment structure and lengths
  • Detects truncated files and data corruption

PNG Validation:

  • Verifies PNG signature and header chunk
  • Validates critical chunks (IHDR, IDAT, IEND)
  • Checks CRC values for all chunks
  • Validates chunk sequence and structure
  • Detects IDAT compression issues

4. Corruption Detection Comparison

Corruption Type Low Sensitivity Medium Sensitivity High Sensitivity
Severely truncated file βœ… βœ… βœ…
Invalid image header βœ… βœ… βœ…
Missing critical data chunks ❌ βœ… βœ…
Missing EOI/EOF markers ❌ βœ… βœ…
Invalid chunk sequences (PNG) ❌ ❌ βœ…
CRC validation errors ❌ ❌ βœ…
Invalid structure but viewable ❌ ❌ βœ…
Partially corrupt data ⚠️ βœ… βœ…
Large gray/black areas ❌ ❌ ❌

βœ… = Detected | ❌ = Not detected | ⚠️ = Sometimes detected

Note: To detect large gray/black areas, use the --check-visual option.

🀝 Contributing

Contributions are welcome! Feel free to submit a Pull Request.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ž Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.


πŸ•ŠοΈ In Memory of Jeff Young

Jeff Young

This project is dedicated to the memory of Jeff Young, who loved Tupac's music and embodied his spirit of bringing people together. Like my brother, Jeff would always reach out to help others, making connections and building community wherever he went. His compassion for people and willingness to always lend a hand to those in need are qualities that inspired this tool's purpose - helping others preserve their precious memories.

May your photos always be as bright and clear as the memories they capture, and may we all strive to connect and help others as Jeff did.


2PAC in action

"You know these corrupt JPEGs will never survive. We're on a mission and our reputation's live."

Made with ❀️ by Richard Young

About

Find and eliminate corrupt image files with visual detection. In memory of Jeff Young.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages