gdc_cmod/node_spooler
mahdahar a9b387b21f docs: Compact node_spooler README.md (429→308 lines)
Remove CI4 integration examples to focus on service documentation

Changes:
- Remove CI4 Integration section (113 lines of PHP/JS examples)
- Remove CI4 Controller from Architecture diagram
- Remove all ReportController, curl, fetch code references
- Condense Quick Start and Troubleshooting sections
- Focus README on pure node_spooler service documentation

Reduction: 429 → 308 lines (-121 lines, 28% smaller)

Scope:
- All API endpoints documented
- Error handling and cleanup procedures preserved
- Monitoring and troubleshooting guides retained
- Deployment instructions maintained
- No CI4 integration code examples
2026-02-03 11:44:44 +07:00
..

PDF Spooler v2.0

Bismillahirohmanirohim.

Overview

Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.

Architecture

Client Application
  ↓ POST {html, filename}
Node.js Spooler (port 3030)
  ↓ queue
Internal Queue (max 5 concurrent)
  ↓ process
PDF Generator (Chrome CDP port 42020)
  ↓ save
data/pdfs/{filename}.pdf

Features

  • HTTP API for PDF generation (no file watching)
  • Internal queue with max 5 concurrent processing
  • Max 100 jobs in queue
  • In-memory job tracking (auto-cleanup after 60 min)
  • Chrome crash detection & restart (max 3 attempts)
  • Comprehensive logging (info, error, metrics)
  • Automated cleanup with dry-run mode
  • Admin dashboard for monitoring
  • Manual error review required (see data/error/)

API Endpoints

POST /api/pdf/generate

Generate PDF from HTML content.

Request:

{
  "html": "<html>...</html>",
  "filename": "1234567890.pdf"
}

Response (Success):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued",
  "message": "Job added to queue"
}

Response (Error):

{
  "success": false,
  "error": "Queue is full, please try again later"
}

GET /api/pdf/status/:jobId

Check job status.

Response (Queued/Processing):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued|processing",
  "progress": 0|50,
  "pdfUrl": null,
  "error": null
}

Response (Completed):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "completed",
  "progress": 100,
  "pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
  "error": null
}

Response (Error):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "error",
  "progress": 0,
  "pdfUrl": null,
  "error": "Chrome timeout"
}

GET /api/queue/stats

Queue statistics.

{
  "success": true,
  "queueSize": 12,
  "processing": 3,
  "completed": 45,
  "errors": 2,
  "avgProcessingTime": 0.82,
  "maxQueueSize": 100
}

Error Handling

Chrome Crash Handling

  1. Chrome crash detected (CDP connection lost or timeout)
  2. Stop processing current jobs
  3. Move queue jobs back to "queued" status
  4. Attempt to restart Chrome (max 3 attempts)
  5. Resume processing

Failed Jobs

  • Failed jobs logged to data/error/{jobId}.json
  • Never auto-deleted (manual review required)
  • Review logs/errors.log for details
  • Error JSON contains full job details including error message

Cleanup

Manual Execution

# Test cleanup (dry-run)
npm run cleanup:dry-run

# Execute cleanup
npm run cleanup

Retention Policy

Directory Retention Action
data/pdfs/ 7 days Move to archive
data/archive/YYYYMM/ 45 days Delete
data/error/ Manual Never delete
logs/ 30 days Delete (compress after 7 days)

Cleanup Tasks

  1. Archive PDFs older than 7 days to data/archive/YYYYMM/
  2. Delete archived PDFs older than 45 days
  3. Compress log files older than 7 days
  4. Delete log files older than 30 days
  5. Check disk space (alert if > 80%)

Monitoring

Admin Dashboard

Open admin.html in browser for:

  • Real-time queue statistics
  • Processing metrics
  • Error file list
  • Disk space visualization

URL: http://localhost:3030/admin.html

Key Metrics

  • Average PDF time: < 2 seconds
  • Success rate: > 95%
  • Queue size: < 100 jobs
  • Disk usage: < 80%

Log Files

  • logs/spooler.log - All API events (info, warn, error)
  • logs/errors.log - PDF generation errors only
  • logs/metrics.log - Performance stats (per job)
  • logs/cleanup.log - Cleanup execution logs

Troubleshooting

Spooler Not Starting

  • Check if Chrome is running on port 42020
  • Check logs: logs/spooler.log
  • Verify directories exist: data/pdfs, data/archive, data/error, logs
  • Check Node.js version: node --version (need 14+)
  • Verify dependencies installed: npm install

Start Chrome manually:

"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

PDF Not Generated

  • Check job status via API: GET /api/pdf/status/{jobId}
  • Review error logs: logs/errors.log
  • Verify Chrome connection: Check logs for CDP connection errors
  • Check HTML content: Ensure valid HTML

Queue Full

  • Wait for current jobs to complete
  • Check admin dashboard for queue size
  • Increase maxQueueSize in spooler.js (default: 100)
  • Check if jobs are stuck (processing too long)

Chrome Crashes Repeatedly

  • Check system RAM (need minimum 2GB available)
  • Reduce maxConcurrent in spooler.js (default: 5)
  • Check for memory leaks in Chrome
  • Restart Chrome manually and monitor
  • Check system resources: Task Manager > Performance

High Disk Usage

  • Run cleanup: npm run cleanup
  • Check data/archive/ for old folders
  • Check logs/ for old logs
  • Check data/pdfs/ for large files
  • Consider reducing PDF retention time in cleanup-config.json

Deployment

Quick Start

# 1. Create directories
cd node_spooler
mkdir -p logs data/pdfs data/archive data/error

# 2. Install dependencies
npm install

# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

# 4. Start spooler
npm start

# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
  -H "Content-Type: application/json" \
  -d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"

# 6. Open admin dashboard
# http://localhost:3030/admin.html

Production Setup

1. Create batch file wrapper:

@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js

2. Create Windows service:

sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler

3. Create scheduled task for cleanup:

schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00

Version History

  • 2.0.0 (2025-02-03): Migrated from file watching to HTTP API queue
    • Removed file watching (chokidar)
    • Added Express HTTP API
    • Internal queue with max 5 concurrent
    • Max 100 jobs in queue
    • Job auto-cleanup after 60 minutes
    • Enhanced error handling with Chrome restart
    • Admin dashboard for monitoring
    • Automated cleanup system

License

Internal use only.