Remove CI4 integration examples to focus on service documentation Changes: - Remove CI4 Integration section (113 lines of PHP/JS examples) - Remove CI4 Controller from Architecture diagram - Remove all ReportController, curl, fetch code references - Condense Quick Start and Troubleshooting sections - Focus README on pure node_spooler service documentation Reduction: 429 → 308 lines (-121 lines, 28% smaller) Scope: - All API endpoints documented - Error handling and cleanup procedures preserved - Monitoring and troubleshooting guides retained - Deployment instructions maintained - No CI4 integration code examples
6.6 KiB
6.6 KiB
PDF Spooler v2.0
Bismillahirohmanirohim.
Overview
Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.
Architecture
Client Application
↓ POST {html, filename}
Node.js Spooler (port 3030)
↓ queue
Internal Queue (max 5 concurrent)
↓ process
PDF Generator (Chrome CDP port 42020)
↓ save
data/pdfs/{filename}.pdf
Features
- HTTP API for PDF generation (no file watching)
- Internal queue with max 5 concurrent processing
- Max 100 jobs in queue
- In-memory job tracking (auto-cleanup after 60 min)
- Chrome crash detection & restart (max 3 attempts)
- Comprehensive logging (info, error, metrics)
- Automated cleanup with dry-run mode
- Admin dashboard for monitoring
- Manual error review required (see
data/error/)
API Endpoints
POST /api/pdf/generate
Generate PDF from HTML content.
Request:
{
"html": "<html>...</html>",
"filename": "1234567890.pdf"
}
Response (Success):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued",
"message": "Job added to queue"
}
Response (Error):
{
"success": false,
"error": "Queue is full, please try again later"
}
GET /api/pdf/status/:jobId
Check job status.
Response (Queued/Processing):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued|processing",
"progress": 0|50,
"pdfUrl": null,
"error": null
}
Response (Completed):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "completed",
"progress": 100,
"pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
"error": null
}
Response (Error):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "error",
"progress": 0,
"pdfUrl": null,
"error": "Chrome timeout"
}
GET /api/queue/stats
Queue statistics.
{
"success": true,
"queueSize": 12,
"processing": 3,
"completed": 45,
"errors": 2,
"avgProcessingTime": 0.82,
"maxQueueSize": 100
}
Error Handling
Chrome Crash Handling
- Chrome crash detected (CDP connection lost or timeout)
- Stop processing current jobs
- Move queue jobs back to "queued" status
- Attempt to restart Chrome (max 3 attempts)
- Resume processing
Failed Jobs
- Failed jobs logged to
data/error/{jobId}.json - Never auto-deleted (manual review required)
- Review
logs/errors.logfor details - Error JSON contains full job details including error message
Cleanup
Manual Execution
# Test cleanup (dry-run)
npm run cleanup:dry-run
# Execute cleanup
npm run cleanup
Retention Policy
| Directory | Retention | Action |
|---|---|---|
data/pdfs/ |
7 days | Move to archive |
data/archive/YYYYMM/ |
45 days | Delete |
data/error/ |
Manual | Never delete |
logs/ |
30 days | Delete (compress after 7 days) |
Cleanup Tasks
- Archive PDFs older than 7 days to
data/archive/YYYYMM/ - Delete archived PDFs older than 45 days
- Compress log files older than 7 days
- Delete log files older than 30 days
- Check disk space (alert if > 80%)
Monitoring
Admin Dashboard
Open admin.html in browser for:
- Real-time queue statistics
- Processing metrics
- Error file list
- Disk space visualization
URL: http://localhost:3030/admin.html
Key Metrics
- Average PDF time: < 2 seconds
- Success rate: > 95%
- Queue size: < 100 jobs
- Disk usage: < 80%
Log Files
logs/spooler.log- All API events (info, warn, error)logs/errors.log- PDF generation errors onlylogs/metrics.log- Performance stats (per job)logs/cleanup.log- Cleanup execution logs
Troubleshooting
Spooler Not Starting
- Check if Chrome is running on port 42020
- Check logs:
logs/spooler.log - Verify directories exist:
data/pdfs,data/archive,data/error,logs - Check Node.js version:
node --version(need 14+) - Verify dependencies installed:
npm install
Start Chrome manually:
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
PDF Not Generated
- Check job status via API:
GET /api/pdf/status/{jobId} - Review error logs:
logs/errors.log - Verify Chrome connection: Check logs for CDP connection errors
- Check HTML content: Ensure valid HTML
Queue Full
- Wait for current jobs to complete
- Check admin dashboard for queue size
- Increase
maxQueueSizeinspooler.js(default: 100) - Check if jobs are stuck (processing too long)
Chrome Crashes Repeatedly
- Check system RAM (need minimum 2GB available)
- Reduce
maxConcurrentinspooler.js(default: 5) - Check for memory leaks in Chrome
- Restart Chrome manually and monitor
- Check system resources: Task Manager > Performance
High Disk Usage
- Run cleanup:
npm run cleanup - Check
data/archive/for old folders - Check
logs/for old logs - Check
data/pdfs/for large files - Consider reducing PDF retention time in
cleanup-config.json
Deployment
Quick Start
# 1. Create directories
cd node_spooler
mkdir -p logs data/pdfs data/archive data/error
# 2. Install dependencies
npm install
# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
# 4. Start spooler
npm start
# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
-H "Content-Type: application/json" \
-d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"
# 6. Open admin dashboard
# http://localhost:3030/admin.html
Production Setup
1. Create batch file wrapper:
@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js
2. Create Windows service:
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler
3. Create scheduled task for cleanup:
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
Version History
- 2.0.0 (2025-02-03): Migrated from file watching to HTTP API queue
- Removed file watching (chokidar)
- Added Express HTTP API
- Internal queue with max 5 concurrent
- Max 100 jobs in queue
- Job auto-cleanup after 60 minutes
- Enhanced error handling with Chrome restart
- Admin dashboard for monitoring
- Automated cleanup system
License
Internal use only.