BREAKING CHANGE: Remove public/spooler_db/ legacy system Changes: - Migrate validation preview from http://glenlis/spooler_db/main_dev.php to CI4 /report/{accessnumber} - Add ReportController::preview() for HTML preview in validation dialog - Add ReportController::generatePdf() to queue PDF generation via node_spooler at http://glenlis:3030 - Add ReportController::checkPdfStatus() to poll spooler job status - Add ReportController::postToSpooler() helper for curl requests to spooler API - Add routes: GET /report/(:num)/preview, GET /report/(:num)/pdf, GET /report/status/(:any) - Delete public/spooler_db/ directory (40+ legacy files) - Compact node_spooler/README.md from 577 to 342 lines Technical Details: - New architecture: CI4 Controller -> node_spooler (port 3030) -> Chrome CDP (port 42020) - API endpoints: POST /api/pdf/generate, GET /api/pdf/status/:jobId, GET /api/queue/stats - Features: Max 5 concurrent jobs, max 100 in queue, auto-cleanup after 60 min - Error handling: Chrome crash detection, manual error review in data/error/ - PDF infrastructure ready, frontend PDF buttons to be updated later in production Migration verified: - No external code references spooler_db - All assets duplicated in public/assets/report/ - Syntax checks passed for ReportController.php and Routes.php Refs: node_spooler/README.md
PDF Spooler v2.0
Bismillahirohmanirohim.
Overview
Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.
Architecture
CI4 Controller
↓ POST {html, filename}
Node.js Spooler (port 3030)
↓ queue
Internal Queue (max 5 concurrent)
↓ process
PDF Generator (Chrome CDP port 42020)
↓ save
data/pdfs/{filename}.pdf
Features
- HTTP API for PDF generation (no file watching)
- Internal queue with max 5 concurrent processing
- Max 100 jobs in queue
- In-memory job tracking (auto-cleanup after 60 min)
- Chrome crash detection & restart (max 3 attempts)
- Comprehensive logging (info, error, metrics)
- Automated cleanup with dry-run mode
- Admin dashboard for monitoring
- Manual error review required (see
data/error/)
API Endpoints
POST /api/pdf/generate
Generate PDF from HTML content.
Request:
{
"html": "<html>...</html>",
"filename": "1234567890.pdf"
}
Response (Success):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued",
"message": "Job added to queue"
}
Response (Error):
{
"success": false,
"error": "Queue is full, please try again later"
}
GET /api/pdf/status/:jobId
Check job status.
Response (Queued/Processing):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued|processing",
"progress": 0|50,
"pdfUrl": null,
"error": null
}
Response (Completed):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "completed",
"progress": 100,
"pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
"error": null
}
Response (Error):
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "error",
"progress": 0,
"pdfUrl": null,
"error": "Chrome timeout"
}
GET /api/queue/stats
Queue statistics.
{
"success": true,
"queueSize": 12,
"processing": 3,
"completed": 45,
"errors": 2,
"avgProcessingTime": 0.82,
"maxQueueSize": 100
}
CI4 Integration
Controller Example
<?php
namespace App\Controllers;
class ReportController extends BaseController {
public function generateReport($accessnumber) {
$html = $this->generateHTML($accessnumber);
$filename = $accessnumber . '.pdf';
$jobId = $this->postToSpooler($html, $filename);
return $this->respond([
'success' => true,
'jobId' => $jobId,
'message' => 'PDF queued for generation',
'status' => 'queued'
]);
}
private function postToSpooler($html, $filename) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://localhost:3030/api/pdf/generate');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
'html' => $html,
'filename' => $filename
]));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'Content-Type: application/json'
]);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode !== 200) {
log_message('error', "Spooler API returned HTTP $httpCode");
throw new \Exception('Failed to queue PDF generation');
}
$data = json_decode($response, true);
return $data['jobId'];
}
public function checkPdfStatus($jobId) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://localhost:3030/api/pdf/status/$jobId");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$response = curl_exec($ch);
curl_close($ch);
return $this->response->setJSON($response);
}
}
Frontend Example (JavaScript)
async function generatePDF(accessNumber) {
try {
const response = await fetch('/report/generate/' + accessNumber, {
method: 'POST'
});
const { jobId, status } = await response.json();
if (status === 'queued') {
alert('PDF queued for generation');
}
return jobId;
} catch (error) {
console.error('Failed to generate PDF:', error);
alert('Failed to generate PDF');
}
}
async function pollPdfStatus(jobId) {
const maxAttempts = 60;
let attempts = 0;
const interval = setInterval(async () => {
if (attempts >= maxAttempts) {
clearInterval(interval);
alert('PDF generation timeout');
return;
}
const response = await fetch(`/report/status/${jobId}`);
const data = await response.json();
if (data.status === 'completed') {
clearInterval(interval);
window.location.href = data.pdfUrl;
} else if (data.status === 'error') {
clearInterval(interval);
alert('PDF generation failed: ' + data.error);
}
attempts++;
}, 2000);
}
Error Handling
Chrome Crash Handling
- Chrome crash detected (CDP connection lost or timeout)
- Stop processing current jobs
- Move queue jobs back to "queued" status
- Attempt to restart Chrome (max 3 attempts)
- Resume processing
Failed Jobs
- Failed jobs logged to
data/error/{jobId}.json - Never auto-deleted (manual review required)
- Review
logs/errors.logfor details - Error JSON contains full job details including error message
Cleanup
Manual Execution
# Test cleanup (dry-run)
npm run cleanup:dry-run
# Execute cleanup
npm run cleanup
Retention Policy
| Directory | Retention | Action |
|---|---|---|
data/pdfs/ |
7 days | Move to archive |
data/archive/YYYYMM/ |
45 days | Delete |
data/error/ |
Manual | Never delete |
logs/ |
30 days | Delete (compress after 7 days) |
Cleanup Tasks
- Archive PDFs older than 7 days to
data/archive/YYYYMM/ - Delete archived PDFs older than 45 days
- Compress log files older than 7 days
- Delete log files older than 30 days
- Check disk space (alert if > 80%)
Monitoring
Admin Dashboard
Open admin.html in browser for:
- Real-time queue statistics
- Processing metrics
- Error file list
- Disk space visualization
URL: http://localhost/gdc_cmod/node_spooler/admin.html
Key Metrics
- Average PDF time: < 2 seconds
- Success rate: > 95%
- Queue size: < 100 jobs
- Disk usage: < 80%
Log Files
logs/spooler.log- All API events (info, warn, error)logs/errors.log- PDF generation errors onlylogs/metrics.log- Performance stats (per job)logs/cleanup.log- Cleanup execution logs
Troubleshooting
Spooler Not Starting
Solutions:
- Check if Chrome is running on port 42020
- Check logs:
logs/spooler.log - Verify directories exist:
data/pdfs,data/archive,data/error,logs - Check Node.js version:
node --version(need 14+) - Verify dependencies installed:
npm install
Start Chrome manually:
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
PDF Not Generated
Solutions:
- Check job status via API:
GET /api/pdf/status/{jobId} - Review error logs:
logs/errors.log - Verify Chrome connection: Check logs for CDP connection errors
- Check HTML content: Ensure valid HTML
Queue Full
Solutions:
- Wait for current jobs to complete
- Check admin dashboard for queue size
- Increase
maxQueueSizeinspooler.js(default: 100) - Check if jobs are stuck (processing too long)
Chrome Crashes Repeatedly
Solutions:
- Check system RAM (need minimum 2GB available)
- Reduce
maxConcurrentinspooler.js(default: 5) - Check for memory leaks in Chrome
- Restart Chrome manually and monitor
- Check system resources: Task Manager > Performance
High Disk Usage
Solutions:
- Run cleanup:
npm run cleanup - Check
data/archive/for old folders - Check
logs/for old logs - Check
data/pdfs/for large files - Consider reducing PDF retention time in
cleanup-config.json
Deployment
Quick Start
# 1. Create directories
cd D:\data\www\gdc_cmod
mkdir -p node_spooler/logs node_spooler/data/pdfs node_spooler/data/archive node_spooler/data/error
# 2. Install dependencies
cd node_spooler
npm install
# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
# 4. Start spooler
npm start
# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
-H "Content-Type: application/json" \
-d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"
# 6. Open admin dashboard
# http://localhost/gdc_cmod/node_spooler/admin.html
Production Setup
- Create batch file wrapper:
@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js
- Create Windows service:
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler
- Create scheduled task for cleanup:
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
Version History
- 2.0.0 (2025-02-03): Migrated from file watching to HTTP API queue
- Removed file watching (chokidar)
- Added Express HTTP API
- Internal queue with max 5 concurrent
- Max 100 jobs in queue
- Job auto-cleanup after 60 minutes
- Enhanced error handling with Chrome restart
- Admin dashboard for monitoring
- Automated cleanup system
License
Internal use only.