gdc_cmod/node_spooler/README.md

# PDF Spooler v2.0

Bismillahirohmanirohim.

## Overview

Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.

## Architecture

```
CI4 Controller
  ↓ POST {html, filename}
Node.js Spooler (port 3030)
  ↓ queue
Internal Queue (max 5 concurrent)
  ↓ process
PDF Generator (Chrome CDP port 42020)
  ↓ save
data/pdfs/{filename}.pdf
```

## Features

- HTTP API for PDF generation (no file watching)
- Internal queue with max 5 concurrent processing
- Max 100 jobs in queue
- In-memory job tracking (auto-cleanup after 60 min)
- Chrome crash detection & restart (max 3 attempts)
- Comprehensive logging (info, error, metrics)
- Automated cleanup with dry-run mode
- Admin dashboard for monitoring
- Manual error review required (see `data/error/`)

## API Endpoints

### POST /api/pdf/generate

Generate PDF from HTML content.

**Request:**
```json
{
  "html": "<html>...</html>",
  "filename": "1234567890.pdf"
}
```

**Response (Success):**
```json
{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued",
  "message": "Job added to queue"
}
```

**Response (Error):**
```json
{
  "success": false,
  "error": "Queue is full, please try again later"
}
```

### GET /api/pdf/status/:jobId

Check job status.

**Response (Queued/Processing):**
```json
{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "queued|processing",
  "progress": 0|50,
  "pdfUrl": null,
  "error": null
}
```

**Response (Completed):**
```json
{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "completed",
  "progress": 100,
  "pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
  "error": null
}
```

**Response (Error):**
```json
{
  "success": true,
  "jobId": "job_1738603845123_abc123xyz",
  "status": "error",
  "progress": 0,
  "pdfUrl": null,
  "error": "Chrome timeout"
}
```

### GET /api/queue/stats

Queue statistics.

```json
{
  "success": true,
  "queueSize": 12,
  "processing": 3,
  "completed": 45,
  "errors": 2,
  "avgProcessingTime": 0.82,
  "maxQueueSize": 100
}
```

## CI4 Integration

### Controller Example

```php
<?php
namespace App\Controllers;

class ReportController extends BaseController {

    public function generateReport($accessnumber) {
        $html = $this->generateHTML($accessnumber);
        $filename = $accessnumber . '.pdf';

        $jobId = $this->postToSpooler($html, $filename);

        return $this->respond([
            'success' => true,
            'jobId' => $jobId,
            'message' => 'PDF queued for generation',
            'status' => 'queued'
        ]);
    }

    private function postToSpooler($html, $filename) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, 'http://localhost:3030/api/pdf/generate');
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
            'html' => $html,
            'filename' => $filename
        ]));
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, [
            'Content-Type: application/json'
        ]);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);

        $response = curl_exec($ch);
        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);

        if ($httpCode !== 200) {
            log_message('error', "Spooler API returned HTTP $httpCode");
            throw new \Exception('Failed to queue PDF generation');
        }

        $data = json_decode($response, true);
        return $data['jobId'];
    }

    public function checkPdfStatus($jobId) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, "http://localhost:3030/api/pdf/status/$jobId");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);

        $response = curl_exec($ch);
        curl_close($ch);

        return $this->response->setJSON($response);
    }
}
```

### Frontend Example (JavaScript)

```javascript
async function generatePDF(accessNumber) {
    try {
        const response = await fetch('/report/generate/' + accessNumber, {
            method: 'POST'
        });

        const { jobId, status } = await response.json();

        if (status === 'queued') {
            alert('PDF queued for generation');
        }

        return jobId;
    } catch (error) {
        console.error('Failed to generate PDF:', error);
        alert('Failed to generate PDF');
    }
}

async function pollPdfStatus(jobId) {
    const maxAttempts = 60;
    let attempts = 0;

    const interval = setInterval(async () => {
        if (attempts >= maxAttempts) {
            clearInterval(interval);
            alert('PDF generation timeout');
            return;
        }

        const response = await fetch(`/report/status/${jobId}`);
        const data = await response.json();

        if (data.status === 'completed') {
            clearInterval(interval);
            window.location.href = data.pdfUrl;
        } else if (data.status === 'error') {
            clearInterval(interval);
            alert('PDF generation failed: ' + data.error);
        }

        attempts++;
    }, 2000);
}
```

## Error Handling

### Chrome Crash Handling

1. Chrome crash detected (CDP connection lost or timeout)
2. Stop processing current jobs
3. Move queue jobs back to "queued" status
4. Attempt to restart Chrome (max 3 attempts)
5. Resume processing

### Failed Jobs

- Failed jobs logged to `data/error/{jobId}.json`
- Never auto-deleted (manual review required)
- Review `logs/errors.log` for details
- Error JSON contains full job details including error message

## Cleanup

### Manual Execution

```bash
# Test cleanup (dry-run)
npm run cleanup:dry-run

# Execute cleanup
npm run cleanup
```

### Retention Policy

| Directory | Retention | Action |
|-----------|-----------|---------|
| `data/pdfs/` | 7 days | Move to archive |
| `data/archive/YYYYMM/` | 45 days | Delete |
| `data/error/` | Manual | Never delete |
| `logs/` | 30 days | Delete (compress after 7 days) |

### Cleanup Tasks

1. Archive PDFs older than 7 days to `data/archive/YYYYMM/`
2. Delete archived PDFs older than 45 days
3. Compress log files older than 7 days
4. Delete log files older than 30 days
5. Check disk space (alert if > 80%)

## Monitoring

### Admin Dashboard

Open `admin.html` in browser for:
- Real-time queue statistics
- Processing metrics
- Error file list
- Disk space visualization

**URL:** `http://localhost/gdc_cmod/node_spooler/admin.html`

### Key Metrics

- Average PDF time: < 2 seconds
- Success rate: > 95%
- Queue size: < 100 jobs
- Disk usage: < 80%

### Log Files

- `logs/spooler.log` - All API events (info, warn, error)
- `logs/errors.log` - PDF generation errors only
- `logs/metrics.log` - Performance stats (per job)
- `logs/cleanup.log` - Cleanup execution logs

## Troubleshooting

### Spooler Not Starting

**Solutions:**
1. Check if Chrome is running on port 42020
2. Check logs: `logs/spooler.log`
3. Verify directories exist: `data/pdfs`, `data/archive`, `data/error`, `logs`
4. Check Node.js version: `node --version` (need 14+)
5. Verify dependencies installed: `npm install`

**Start Chrome manually:**
```bash
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
```

### PDF Not Generated

**Solutions:**
1. Check job status via API: `GET /api/pdf/status/{jobId}`
2. Review error logs: `logs/errors.log`
3. Verify Chrome connection: Check logs for CDP connection errors
4. Check HTML content: Ensure valid HTML

### Queue Full

**Solutions:**
1. Wait for current jobs to complete
2. Check admin dashboard for queue size
3. Increase `maxQueueSize` in `spooler.js` (default: 100)
4. Check if jobs are stuck (processing too long)

### Chrome Crashes Repeatedly

**Solutions:**
1. Check system RAM (need minimum 2GB available)
2. Reduce `maxConcurrent` in `spooler.js` (default: 5)
3. Check for memory leaks in Chrome
4. Restart Chrome manually and monitor
5. Check system resources: Task Manager > Performance

### High Disk Usage

**Solutions:**
1. Run cleanup: `npm run cleanup`
2. Check `data/archive/` for old folders
3. Check `logs/` for old logs
4. Check `data/pdfs/` for large files
5. Consider reducing PDF retention time in `cleanup-config.json`

## Deployment

### Quick Start

```bash
# 1. Create directories
cd D:\data\www\gdc_cmod
mkdir -p node_spooler/logs node_spooler/data/pdfs node_spooler/data/archive node_spooler/data/error

# 2. Install dependencies
cd node_spooler
npm install

# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020

# 4. Start spooler
npm start

# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
  -H "Content-Type: application/json" \
  -d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"

# 6. Open admin dashboard
# http://localhost/gdc_cmod/node_spooler/admin.html
```

### Production Setup

1. Create batch file wrapper:
```batch
@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js
```

2. Create Windows service:
```batch
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler
```

3. Create scheduled task for cleanup:
```batch
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
```

## Version History

- **2.0.0 (2025-02-03):** Migrated from file watching to HTTP API queue
  - Removed file watching (chokidar)
  - Added Express HTTP API
  - Internal queue with max 5 concurrent
  - Max 100 jobs in queue
  - Job auto-cleanup after 60 minutes
  - Enhanced error handling with Chrome restart
  - Admin dashboard for monitoring
  - Automated cleanup system

## License

Internal use only.