mahdahar 2843ddd392 Migrate PDF generation from legacy spooler_db to CI4 + node_spooler
BREAKING CHANGE: Remove public/spooler_db/ legacy system

Changes:
- Migrate validation preview from http://glenlis/spooler_db/main_dev.php to CI4 /report/{accessnumber}
- Add ReportController::preview() for HTML preview in validation dialog
- Add ReportController::generatePdf() to queue PDF generation via node_spooler at http://glenlis:3030
- Add ReportController::checkPdfStatus() to poll spooler job status
- Add ReportController::postToSpooler() helper for curl requests to spooler API
- Add routes: GET /report/(:num)/preview, GET /report/(:num)/pdf, GET /report/status/(:any)
- Delete public/spooler_db/ directory (40+ legacy files)
- Compact node_spooler/README.md from 577 to 342 lines

Technical Details:
- New architecture: CI4 Controller -> node_spooler (port 3030) -> Chrome CDP (port 42020)
- API endpoints: POST /api/pdf/generate, GET /api/pdf/status/:jobId, GET /api/queue/stats
- Features: Max 5 concurrent jobs, max 100 in queue, auto-cleanup after 60 min
- Error handling: Chrome crash detection, manual error review in data/error/
- PDF infrastructure ready, frontend PDF buttons to be updated later in production

Migration verified:
- No external code references spooler_db
- All assets duplicated in public/assets/report/
- Syntax checks passed for ReportController.php and Routes.php

Refs: node_spooler/README.md
2026-02-03 11:33:55 +07:00

429 lines
10 KiB
Markdown

# PDF Spooler v2.0
Bismillahirohmanirohim.
## Overview
Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.
## Architecture
```
CI4 Controller
↓ POST {html, filename}
Node.js Spooler (port 3030)
↓ queue
Internal Queue (max 5 concurrent)
↓ process
PDF Generator (Chrome CDP port 42020)
↓ save
data/pdfs/{filename}.pdf
```
## Features
- HTTP API for PDF generation (no file watching)
- Internal queue with max 5 concurrent processing
- Max 100 jobs in queue
- In-memory job tracking (auto-cleanup after 60 min)
- Chrome crash detection & restart (max 3 attempts)
- Comprehensive logging (info, error, metrics)
- Automated cleanup with dry-run mode
- Admin dashboard for monitoring
- Manual error review required (see `data/error/`)
## API Endpoints
### POST /api/pdf/generate
Generate PDF from HTML content.
**Request:**
```json
{
"html": "<html>...</html>",
"filename": "1234567890.pdf"
}
```
**Response (Success):**
```json
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued",
"message": "Job added to queue"
}
```
**Response (Error):**
```json
{
"success": false,
"error": "Queue is full, please try again later"
}
```
### GET /api/pdf/status/:jobId
Check job status.
**Response (Queued/Processing):**
```json
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "queued|processing",
"progress": 0|50,
"pdfUrl": null,
"error": null
}
```
**Response (Completed):**
```json
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "completed",
"progress": 100,
"pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
"error": null
}
```
**Response (Error):**
```json
{
"success": true,
"jobId": "job_1738603845123_abc123xyz",
"status": "error",
"progress": 0,
"pdfUrl": null,
"error": "Chrome timeout"
}
```
### GET /api/queue/stats
Queue statistics.
```json
{
"success": true,
"queueSize": 12,
"processing": 3,
"completed": 45,
"errors": 2,
"avgProcessingTime": 0.82,
"maxQueueSize": 100
}
```
## CI4 Integration
### Controller Example
```php
<?php
namespace App\Controllers;
class ReportController extends BaseController {
public function generateReport($accessnumber) {
$html = $this->generateHTML($accessnumber);
$filename = $accessnumber . '.pdf';
$jobId = $this->postToSpooler($html, $filename);
return $this->respond([
'success' => true,
'jobId' => $jobId,
'message' => 'PDF queued for generation',
'status' => 'queued'
]);
}
private function postToSpooler($html, $filename) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://localhost:3030/api/pdf/generate');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
'html' => $html,
'filename' => $filename
]));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'Content-Type: application/json'
]);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode !== 200) {
log_message('error', "Spooler API returned HTTP $httpCode");
throw new \Exception('Failed to queue PDF generation');
}
$data = json_decode($response, true);
return $data['jobId'];
}
public function checkPdfStatus($jobId) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://localhost:3030/api/pdf/status/$jobId");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$response = curl_exec($ch);
curl_close($ch);
return $this->response->setJSON($response);
}
}
```
### Frontend Example (JavaScript)
```javascript
async function generatePDF(accessNumber) {
try {
const response = await fetch('/report/generate/' + accessNumber, {
method: 'POST'
});
const { jobId, status } = await response.json();
if (status === 'queued') {
alert('PDF queued for generation');
}
return jobId;
} catch (error) {
console.error('Failed to generate PDF:', error);
alert('Failed to generate PDF');
}
}
async function pollPdfStatus(jobId) {
const maxAttempts = 60;
let attempts = 0;
const interval = setInterval(async () => {
if (attempts >= maxAttempts) {
clearInterval(interval);
alert('PDF generation timeout');
return;
}
const response = await fetch(`/report/status/${jobId}`);
const data = await response.json();
if (data.status === 'completed') {
clearInterval(interval);
window.location.href = data.pdfUrl;
} else if (data.status === 'error') {
clearInterval(interval);
alert('PDF generation failed: ' + data.error);
}
attempts++;
}, 2000);
}
```
## Error Handling
### Chrome Crash Handling
1. Chrome crash detected (CDP connection lost or timeout)
2. Stop processing current jobs
3. Move queue jobs back to "queued" status
4. Attempt to restart Chrome (max 3 attempts)
5. Resume processing
### Failed Jobs
- Failed jobs logged to `data/error/{jobId}.json`
- Never auto-deleted (manual review required)
- Review `logs/errors.log` for details
- Error JSON contains full job details including error message
## Cleanup
### Manual Execution
```bash
# Test cleanup (dry-run)
npm run cleanup:dry-run
# Execute cleanup
npm run cleanup
```
### Retention Policy
| Directory | Retention | Action |
|-----------|-----------|---------|
| `data/pdfs/` | 7 days | Move to archive |
| `data/archive/YYYYMM/` | 45 days | Delete |
| `data/error/` | Manual | Never delete |
| `logs/` | 30 days | Delete (compress after 7 days) |
### Cleanup Tasks
1. Archive PDFs older than 7 days to `data/archive/YYYYMM/`
2. Delete archived PDFs older than 45 days
3. Compress log files older than 7 days
4. Delete log files older than 30 days
5. Check disk space (alert if > 80%)
## Monitoring
### Admin Dashboard
Open `admin.html` in browser for:
- Real-time queue statistics
- Processing metrics
- Error file list
- Disk space visualization
**URL:** `http://localhost/gdc_cmod/node_spooler/admin.html`
### Key Metrics
- Average PDF time: < 2 seconds
- Success rate: > 95%
- Queue size: < 100 jobs
- Disk usage: < 80%
### Log Files
- `logs/spooler.log` - All API events (info, warn, error)
- `logs/errors.log` - PDF generation errors only
- `logs/metrics.log` - Performance stats (per job)
- `logs/cleanup.log` - Cleanup execution logs
## Troubleshooting
### Spooler Not Starting
**Solutions:**
1. Check if Chrome is running on port 42020
2. Check logs: `logs/spooler.log`
3. Verify directories exist: `data/pdfs`, `data/archive`, `data/error`, `logs`
4. Check Node.js version: `node --version` (need 14+)
5. Verify dependencies installed: `npm install`
**Start Chrome manually:**
```bash
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
```
### PDF Not Generated
**Solutions:**
1. Check job status via API: `GET /api/pdf/status/{jobId}`
2. Review error logs: `logs/errors.log`
3. Verify Chrome connection: Check logs for CDP connection errors
4. Check HTML content: Ensure valid HTML
### Queue Full
**Solutions:**
1. Wait for current jobs to complete
2. Check admin dashboard for queue size
3. Increase `maxQueueSize` in `spooler.js` (default: 100)
4. Check if jobs are stuck (processing too long)
### Chrome Crashes Repeatedly
**Solutions:**
1. Check system RAM (need minimum 2GB available)
2. Reduce `maxConcurrent` in `spooler.js` (default: 5)
3. Check for memory leaks in Chrome
4. Restart Chrome manually and monitor
5. Check system resources: Task Manager > Performance
### High Disk Usage
**Solutions:**
1. Run cleanup: `npm run cleanup`
2. Check `data/archive/` for old folders
3. Check `logs/` for old logs
4. Check `data/pdfs/` for large files
5. Consider reducing PDF retention time in `cleanup-config.json`
## Deployment
### Quick Start
```bash
# 1. Create directories
cd D:\data\www\gdc_cmod
mkdir -p node_spooler/logs node_spooler/data/pdfs node_spooler/data/archive node_spooler/data/error
# 2. Install dependencies
cd node_spooler
npm install
# 3. Start Chrome (if not running)
"C:/Program Files/Google/Chrome/Application/chrome.exe"
--headless
--disable-gpu
--remote-debugging-port=42020
# 4. Start spooler
npm start
# 5. Test API
curl -X POST http://localhost:3030/api/pdf/generate \
-H "Content-Type: application/json" \
-d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"
# 6. Open admin dashboard
# http://localhost/gdc_cmod/node_spooler/admin.html
```
### Production Setup
1. Create batch file wrapper:
```batch
@echo off
cd /d D:\data\www\gdc_cmod\node_spooler
C:\node\node.exe spooler.js
```
2. Create Windows service:
```batch
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
sc start PDFSpooler
```
3. Create scheduled task for cleanup:
```batch
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
```
## Version History
- **2.0.0 (2025-02-03):** Migrated from file watching to HTTP API queue
- Removed file watching (chokidar)
- Added Express HTTP API
- Internal queue with max 5 concurrent
- Max 100 jobs in queue
- Job auto-cleanup after 60 minutes
- Enhanced error handling with Chrome restart
- Admin dashboard for monitoring
- Automated cleanup system
## License
Internal use only.