mahdahar a9b387b21f docs: Compact node_spooler README.md (429→308 lines)

Remove CI4 integration examples to focus on service documentation

Changes:
- Remove CI4 Integration section (113 lines of PHP/JS examples)
- Remove CI4 Controller from Architecture diagram
- Remove all ReportController, curl, fetch code references
- Condense Quick Start and Troubleshooting sections
- Focus README on pure node_spooler service documentation

Reduction: 429 → 308 lines (-121 lines, 28% smaller)

Scope:
- All API endpoints documented
- Error handling and cleanup procedures preserved
- Monitoring and troubleshooting guides retained
- Deployment instructions maintained
- No CI4 integration code examples

2026-02-03 11:44:44 +07:00

6.6 KiB

Raw Blame History

PDF Spooler v2.0

Bismillahirohmanirohim.

Overview

Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.

Architecture

Client Application
  ↓ POST {html, filename}
Node.js Spooler (port 3030)
  ↓ queue
Internal Queue (max 5 concurrent)
  ↓ process
PDF Generator (Chrome CDP port 42020)
  ↓ save
data/pdfs/{filename}.pdf

Features

HTTP API for PDF generation (no file watching)
Internal queue with max 5 concurrent processing
Max 100 jobs in queue
In-memory job tracking (auto-cleanup after 60 min)
Chrome crash detection & restart (max 3 attempts)
Comprehensive logging (info, error, metrics)
Automated cleanup with dry-run mode
Admin dashboard for monitoring
Manual error review required (see data/error/)

API Endpoints

POST /api/pdf/generate

Generate PDF from HTML content.

Request:

{
  "html": "<html>...</html>",
  "filename": "1234567890.pdf"
}

Response (Success):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued",
  "message": "Job added to queue"
}

Response (Error):

{
  "success": false,
  "error": "Queue is full, please try again later"
}

GET /api/pdf/status/:jobId

Check job status.

Response (Queued/Processing):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued|processing",
  "progress": 0|50,
  "pdfUrl": null,
  "error": null
}

Response (Completed):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "completed",
  "progress": 100,
  "pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
  "error": null
}

Response (Error):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "error",
  "progress": 0,
  "pdfUrl": null,
  "error": "Chrome timeout"
}

GET /api/queue/stats

Queue statistics.

{
  "success": true,
  "queueSize": 12,
  "processing": 3,
  "completed": 45,
  "errors": 2,
  "avgProcessingTime": 0.82,
  "maxQueueSize": 100
}

Error Handling

Chrome Crash Handling

Chrome crash detected (CDP connection lost or timeout)
Stop processing current jobs
Move queue jobs back to "queued" status
Attempt to restart Chrome (max 3 attempts)
Resume processing

Failed Jobs

Failed jobs logged to data/error/{jobId}.json
Never auto-deleted (manual review required)
Review logs/errors.log for details
Error JSON contains full job details including error message

Cleanup

Manual Execution

# Test cleanup (dry-run)
npm run cleanup:dry-run

# Execute cleanup
npm run cleanup

Retention Policy

Directory	Retention	Action
`data/pdfs/`	7 days	Move to archive
`data/archive/YYYYMM/`	45 days	Delete
`data/error/`	Manual	Never delete
`logs/`	30 days	Delete (compress after 7 days)

Cleanup Tasks

Archive PDFs older than 7 days to data/archive/YYYYMM/
Delete archived PDFs older than 45 days
Compress log files older than 7 days
Delete log files older than 30 days
Check disk space (alert if > 80%)

Monitoring

Admin Dashboard

Open admin.html in browser for:

Real-time queue statistics
Processing metrics
Error file list
Disk space visualization

URL: http://localhost:3030/admin.html

Key Metrics

Average PDF time: < 2 seconds
Success rate: > 95%
Queue size: < 100 jobs
Disk usage: < 80%

Log Files

logs/spooler.log - All API events (info, warn, error)
logs/errors.log - PDF generation errors only
logs/metrics.log - Performance stats (per job)
logs/cleanup.log - Cleanup execution logs

Troubleshooting

Spooler Not Starting

Check if Chrome is running on port 42020
Check logs: logs/spooler.log
Verify directories exist: data/pdfs, data/archive, data/error, logs
Check Node.js version: node --version (need 14+)
Verify dependencies installed: npm install

Start Chrome manually:

"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

PDF Not Generated

Check job status via API: GET /api/pdf/status/{jobId}
Review error logs: logs/errors.log
Verify Chrome connection: Check logs for CDP connection errors
Check HTML content: Ensure valid HTML

Queue Full

Wait for current jobs to complete
Check admin dashboard for queue size
Increase maxQueueSize in spooler.js (default: 100)
Check if jobs are stuck (processing too long)

Chrome Crashes Repeatedly

Check system RAM (need minimum 2GB available)
Reduce maxConcurrent in spooler.js (default: 5)
Check for memory leaks in Chrome
Restart Chrome manually and monitor
Check system resources: Task Manager > Performance

High Disk Usage

Run cleanup: npm run cleanup
Check data/archive/ for old folders
Check logs/ for old logs
Check data/pdfs/ for large files
Consider reducing PDF retention time in cleanup-config.json

Deployment

Quick Start

# 1. Create directories
cd node_spooler
mkdir -p logs data/pdfs data/archive data/error

# 2. Install dependencies
npm install

# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

# 4. Start spooler
npm start

# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
  -H "Content-Type: application/json" \
  -d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"

# 6. Open admin dashboard
# http://localhost:3030/admin.html

Production Setup

1. Create batch file wrapper:

@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js

2. Create Windows service:

sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler

3. Create scheduled task for cleanup:

schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00

Version History

2.0.0 (2025-02-03): Migrated from file watching to HTTP API queue
- Removed file watching (chokidar)
- Added Express HTTP API
- Internal queue with max 5 concurrent
- Max 100 jobs in queue
- Job auto-cleanup after 60 minutes
- Enhanced error handling with Chrome restart
- Admin dashboard for monitoring
- Automated cleanup system

License

Internal use only.

6.6 KiB Raw Blame History