gdc_cmod/node_spooler
mahdahar 2843ddd392 Migrate PDF generation from legacy spooler_db to CI4 + node_spooler
BREAKING CHANGE: Remove public/spooler_db/ legacy system

Changes:
- Migrate validation preview from http://glenlis/spooler_db/main_dev.php to CI4 /report/{accessnumber}
- Add ReportController::preview() for HTML preview in validation dialog
- Add ReportController::generatePdf() to queue PDF generation via node_spooler at http://glenlis:3030
- Add ReportController::checkPdfStatus() to poll spooler job status
- Add ReportController::postToSpooler() helper for curl requests to spooler API
- Add routes: GET /report/(:num)/preview, GET /report/(:num)/pdf, GET /report/status/(:any)
- Delete public/spooler_db/ directory (40+ legacy files)
- Compact node_spooler/README.md from 577 to 342 lines

Technical Details:
- New architecture: CI4 Controller -> node_spooler (port 3030) -> Chrome CDP (port 42020)
- API endpoints: POST /api/pdf/generate, GET /api/pdf/status/:jobId, GET /api/queue/stats
- Features: Max 5 concurrent jobs, max 100 in queue, auto-cleanup after 60 min
- Error handling: Chrome crash detection, manual error review in data/error/
- PDF infrastructure ready, frontend PDF buttons to be updated later in production

Migration verified:
- No external code references spooler_db
- All assets duplicated in public/assets/report/
- Syntax checks passed for ReportController.php and Routes.php

Refs: node_spooler/README.md
2026-02-03 11:33:55 +07:00
..

PDF Spooler v2.0

Bismillahirohmanirohim.

Overview

Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.

Architecture

CI4 Controller
  ↓ POST {html, filename}
Node.js Spooler (port 3030)
  ↓ queue
Internal Queue (max 5 concurrent)
  ↓ process
PDF Generator (Chrome CDP port 42020)
  ↓ save
data/pdfs/{filename}.pdf

Features

  • HTTP API for PDF generation (no file watching)
  • Internal queue with max 5 concurrent processing
  • Max 100 jobs in queue
  • In-memory job tracking (auto-cleanup after 60 min)
  • Chrome crash detection & restart (max 3 attempts)
  • Comprehensive logging (info, error, metrics)
  • Automated cleanup with dry-run mode
  • Admin dashboard for monitoring
  • Manual error review required (see data/error/)

API Endpoints

POST /api/pdf/generate

Generate PDF from HTML content.

Request:

{
  "html": "<html>...</html>",
  "filename": "1234567890.pdf"
}

Response (Success):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued",
  "message": "Job added to queue"
}

Response (Error):

{
  "success": false,
  "error": "Queue is full, please try again later"
}

GET /api/pdf/status/:jobId

Check job status.

Response (Queued/Processing):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued|processing",
  "progress": 0|50,
  "pdfUrl": null,
  "error": null
}

Response (Completed):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "completed",
  "progress": 100,
  "pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
  "error": null
}

Response (Error):

{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "error",
  "progress": 0,
  "pdfUrl": null,
  "error": "Chrome timeout"
}

GET /api/queue/stats

Queue statistics.

{
  "success": true,
  "queueSize": 12,
  "processing": 3,
  "completed": 45,
  "errors": 2,
  "avgProcessingTime": 0.82,
  "maxQueueSize": 100
}

CI4 Integration

Controller Example

<?php
namespace App\Controllers;

class ReportController extends BaseController {
    
    public function generateReport($accessnumber) {
        $html = $this->generateHTML($accessnumber);
        $filename = $accessnumber . '.pdf';
        
        $jobId = $this->postToSpooler($html, $filename);
        
        return $this->respond([
            'success' => true,
            'jobId' => $jobId,
            'message' => 'PDF queued for generation',
            'status' => 'queued'
        ]);
    }
    
    private function postToSpooler($html, $filename) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, 'http://localhost:3030/api/pdf/generate');
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
            'html' => $html,
            'filename' => $filename
        ]));
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, [
            'Content-Type: application/json'
        ]);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);
        
        $response = curl_exec($ch);
        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        
        if ($httpCode !== 200) {
            log_message('error', "Spooler API returned HTTP $httpCode");
            throw new \Exception('Failed to queue PDF generation');
        }
        
        $data = json_decode($response, true);
        return $data['jobId'];
    }
    
    public function checkPdfStatus($jobId) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, "http://localhost:3030/api/pdf/status/$jobId");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        
        $response = curl_exec($ch);
        curl_close($ch);
        
        return $this->response->setJSON($response);
    }
}

Frontend Example (JavaScript)

async function generatePDF(accessNumber) {
    try {
        const response = await fetch('/report/generate/' + accessNumber, {
            method: 'POST'
        });
        
        const { jobId, status } = await response.json();
        
        if (status === 'queued') {
            alert('PDF queued for generation');
        }
        
        return jobId;
    } catch (error) {
        console.error('Failed to generate PDF:', error);
        alert('Failed to generate PDF');
    }
}

async function pollPdfStatus(jobId) {
    const maxAttempts = 60;
    let attempts = 0;
    
    const interval = setInterval(async () => {
        if (attempts >= maxAttempts) {
            clearInterval(interval);
            alert('PDF generation timeout');
            return;
        }
        
        const response = await fetch(`/report/status/${jobId}`);
        const data = await response.json();
        
        if (data.status === 'completed') {
            clearInterval(interval);
            window.location.href = data.pdfUrl;
        } else if (data.status === 'error') {
            clearInterval(interval);
            alert('PDF generation failed: ' + data.error);
        }
        
        attempts++;
    }, 2000);
}

Error Handling

Chrome Crash Handling

  1. Chrome crash detected (CDP connection lost or timeout)
  2. Stop processing current jobs
  3. Move queue jobs back to "queued" status
  4. Attempt to restart Chrome (max 3 attempts)
  5. Resume processing

Failed Jobs

  • Failed jobs logged to data/error/{jobId}.json
  • Never auto-deleted (manual review required)
  • Review logs/errors.log for details
  • Error JSON contains full job details including error message

Cleanup

Manual Execution

# Test cleanup (dry-run)
npm run cleanup:dry-run

# Execute cleanup
npm run cleanup

Retention Policy

Directory Retention Action
data/pdfs/ 7 days Move to archive
data/archive/YYYYMM/ 45 days Delete
data/error/ Manual Never delete
logs/ 30 days Delete (compress after 7 days)

Cleanup Tasks

  1. Archive PDFs older than 7 days to data/archive/YYYYMM/
  2. Delete archived PDFs older than 45 days
  3. Compress log files older than 7 days
  4. Delete log files older than 30 days
  5. Check disk space (alert if > 80%)

Monitoring

Admin Dashboard

Open admin.html in browser for:

  • Real-time queue statistics
  • Processing metrics
  • Error file list
  • Disk space visualization

URL: http://localhost/gdc_cmod/node_spooler/admin.html

Key Metrics

  • Average PDF time: < 2 seconds
  • Success rate: > 95%
  • Queue size: < 100 jobs
  • Disk usage: < 80%

Log Files

  • logs/spooler.log - All API events (info, warn, error)
  • logs/errors.log - PDF generation errors only
  • logs/metrics.log - Performance stats (per job)
  • logs/cleanup.log - Cleanup execution logs

Troubleshooting

Spooler Not Starting

Solutions:

  1. Check if Chrome is running on port 42020
  2. Check logs: logs/spooler.log
  3. Verify directories exist: data/pdfs, data/archive, data/error, logs
  4. Check Node.js version: node --version (need 14+)
  5. Verify dependencies installed: npm install

Start Chrome manually:

"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

PDF Not Generated

Solutions:

  1. Check job status via API: GET /api/pdf/status/{jobId}
  2. Review error logs: logs/errors.log
  3. Verify Chrome connection: Check logs for CDP connection errors
  4. Check HTML content: Ensure valid HTML

Queue Full

Solutions:

  1. Wait for current jobs to complete
  2. Check admin dashboard for queue size
  3. Increase maxQueueSize in spooler.js (default: 100)
  4. Check if jobs are stuck (processing too long)

Chrome Crashes Repeatedly

Solutions:

  1. Check system RAM (need minimum 2GB available)
  2. Reduce maxConcurrent in spooler.js (default: 5)
  3. Check for memory leaks in Chrome
  4. Restart Chrome manually and monitor
  5. Check system resources: Task Manager > Performance

High Disk Usage

Solutions:

  1. Run cleanup: npm run cleanup
  2. Check data/archive/ for old folders
  3. Check logs/ for old logs
  4. Check data/pdfs/ for large files
  5. Consider reducing PDF retention time in cleanup-config.json

Deployment

Quick Start

# 1. Create directories
cd D:\data\www\gdc_cmod
mkdir -p node_spooler/logs node_spooler/data/pdfs node_spooler/data/archive node_spooler/data/error

# 2. Install dependencies
cd node_spooler
npm install

# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe" 
--headless 
--disable-gpu 
--remote-debugging-port=42020

# 4. Start spooler
npm start

# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
  -H "Content-Type: application/json" \
  -d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"

# 6. Open admin dashboard
# http://localhost/gdc_cmod/node_spooler/admin.html

Production Setup

  1. Create batch file wrapper:
@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js
  1. Create Windows service:
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler
  1. Create scheduled task for cleanup:
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00

Version History

  • 2.0.0 (2025-02-03): Migrated from file watching to HTTP API queue
    • Removed file watching (chokidar)
    • Added Express HTTP API
    • Internal queue with max 5 concurrent
    • Max 100 jobs in queue
    • Job auto-cleanup after 60 minutes
    • Enhanced error handling with Chrome restart
    • Admin dashboard for monitoring
    • Automated cleanup system

License

Internal use only.