Key Features

Document Viewer

Interface Features
  • Archive.org-style interface - Clean, focused document browsing
  • Zoom controls - Click to zoom or use +/- buttons
  • Fullscreen mode - Press F11 or click fullscreen button
  • Progress tracking - Visual progress bar showing position
  • Keyboard navigation - Arrow keys, Home, End for quick navigation
Image Processing
  • Automatic format conversion - TIF files converted to JPEG for browser compatibility
  • High-quality thumbnails - Fast loading with optimized image sizes
  • Responsive scaling - Images adapt to screen size
  • Pan and zoom - Smooth image navigation

Search & Discovery

Search Types
  • Filename search - Find documents by name
  • OCR text search - Search within extracted text content
  • Combined search - Search both filenames and content
  • Advanced filtering - Filter by OCR status and file type
Search Features
  • Text excerpts - See context around search matches
  • Search highlighting - Highlighted search terms in results
  • Pagination - Navigate through large result sets
  • Sorting options - Sort by relevance, filename, or ID

OCR Processing

Text Extraction
  • Tesseract OCR - Industry-standard text recognition
  • Memory efficient - Lightweight processing for large datasets
  • Background processing - Non-blocking OCR operations
  • Progress tracking - Real-time processing status
Text Display
  • Side-by-side view - Image and text displayed together
  • Searchable content - Full-text search through extracted text
  • Text formatting - Preserved formatting and structure
  • Error handling - Graceful handling of OCR failures

Technical Features

Data Management
  • Idempotent operations - Safe to re-run during file transfers
  • File hash tracking - Detect changed files efficiently
  • SQLite database - Fast, reliable data storage
  • Directory structure mapping - Complete file organization
Performance & Reliability
  • Responsive design - Works on desktop and mobile devices
  • Production ready - Screen-based process management
  • Error handling - Robust error handling and recovery
  • Analytics tracking - User behavior and search analytics

API Features

REST API
  • Search API - Programmatic document search
  • Statistics API - System status and progress
  • Image serving - Direct access to document images
  • Thumbnail generation - Optimized image thumbnails
Integration
  • JSON responses - Machine-readable data formats
  • CORS support - Cross-origin resource sharing
  • Pagination - Efficient handling of large datasets
  • Error handling - Consistent error response format

User Experience

Navigation
  • Quick navigation - Jump to first, middle, or last document
  • Random browsing - Discover documents randomly
  • Keyboard shortcuts - Power user navigation
  • Breadcrumb navigation - Clear page hierarchy
Interface
  • Clean design - Minimal, distraction-free interface
  • Mobile responsive - Works on all device sizes
  • Accessibility - Screen reader and keyboard friendly
  • Loading indicators - Clear feedback during operations