BREAKING CHANGE: Remove public/spooler_db/ legacy system Changes: - Migrate validation preview from http://glenlis/spooler_db/main_dev.php to CI4 /report/{accessnumber} - Add ReportController::preview() for HTML preview in validation dialog - Add ReportController::generatePdf() to queue PDF generation via node_spooler at http://glenlis:3030 - Add ReportController::checkPdfStatus() to poll spooler job status - Add ReportController::postToSpooler() helper for curl requests to spooler API - Add routes: GET /report/(:num)/preview, GET /report/(:num)/pdf, GET /report/status/(:any) - Delete public/spooler_db/ directory (40+ legacy files) - Compact node_spooler/README.md from 577 to 342 lines Technical Details: - New architecture: CI4 Controller -> node_spooler (port 3030) -> Chrome CDP (port 42020) - API endpoints: POST /api/pdf/generate, GET /api/pdf/status/:jobId, GET /api/queue/stats - Features: Max 5 concurrent jobs, max 100 in queue, auto-cleanup after 60 min - Error handling: Chrome crash detection, manual error review in data/error/ - PDF infrastructure ready, frontend PDF buttons to be updated later in production Migration verified: - No external code references spooler_db - All assets duplicated in public/assets/report/ - Syntax checks passed for ReportController.php and Routes.php Refs: node_spooler/README.md
429 lines
10 KiB
Markdown
429 lines
10 KiB
Markdown
# PDF Spooler v2.0
|
|
|
|
Bismillahirohmanirohim.
|
|
|
|
## Overview
|
|
|
|
Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
CI4 Controller
|
|
↓ POST {html, filename}
|
|
Node.js Spooler (port 3030)
|
|
↓ queue
|
|
Internal Queue (max 5 concurrent)
|
|
↓ process
|
|
PDF Generator (Chrome CDP port 42020)
|
|
↓ save
|
|
data/pdfs/{filename}.pdf
|
|
```
|
|
|
|
## Features
|
|
|
|
- HTTP API for PDF generation (no file watching)
|
|
- Internal queue with max 5 concurrent processing
|
|
- Max 100 jobs in queue
|
|
- In-memory job tracking (auto-cleanup after 60 min)
|
|
- Chrome crash detection & restart (max 3 attempts)
|
|
- Comprehensive logging (info, error, metrics)
|
|
- Automated cleanup with dry-run mode
|
|
- Admin dashboard for monitoring
|
|
- Manual error review required (see `data/error/`)
|
|
|
|
## API Endpoints
|
|
|
|
### POST /api/pdf/generate
|
|
|
|
Generate PDF from HTML content.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"html": "<html>...</html>",
|
|
"filename": "1234567890.pdf"
|
|
}
|
|
```
|
|
|
|
**Response (Success):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "queued",
|
|
"message": "Job added to queue"
|
|
}
|
|
```
|
|
|
|
**Response (Error):**
|
|
```json
|
|
{
|
|
"success": false,
|
|
"error": "Queue is full, please try again later"
|
|
}
|
|
```
|
|
|
|
### GET /api/pdf/status/:jobId
|
|
|
|
Check job status.
|
|
|
|
**Response (Queued/Processing):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "queued|processing",
|
|
"progress": 0|50,
|
|
"pdfUrl": null,
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
**Response (Completed):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "completed",
|
|
"progress": 100,
|
|
"pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
**Response (Error):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "error",
|
|
"progress": 0,
|
|
"pdfUrl": null,
|
|
"error": "Chrome timeout"
|
|
}
|
|
```
|
|
|
|
### GET /api/queue/stats
|
|
|
|
Queue statistics.
|
|
|
|
```json
|
|
{
|
|
"success": true,
|
|
"queueSize": 12,
|
|
"processing": 3,
|
|
"completed": 45,
|
|
"errors": 2,
|
|
"avgProcessingTime": 0.82,
|
|
"maxQueueSize": 100
|
|
}
|
|
```
|
|
|
|
## CI4 Integration
|
|
|
|
### Controller Example
|
|
|
|
```php
|
|
<?php
|
|
namespace App\Controllers;
|
|
|
|
class ReportController extends BaseController {
|
|
|
|
public function generateReport($accessnumber) {
|
|
$html = $this->generateHTML($accessnumber);
|
|
$filename = $accessnumber . '.pdf';
|
|
|
|
$jobId = $this->postToSpooler($html, $filename);
|
|
|
|
return $this->respond([
|
|
'success' => true,
|
|
'jobId' => $jobId,
|
|
'message' => 'PDF queued for generation',
|
|
'status' => 'queued'
|
|
]);
|
|
}
|
|
|
|
private function postToSpooler($html, $filename) {
|
|
$ch = curl_init();
|
|
curl_setopt($ch, CURLOPT_URL, 'http://localhost:3030/api/pdf/generate');
|
|
curl_setopt($ch, CURLOPT_POST, 1);
|
|
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
|
|
'html' => $html,
|
|
'filename' => $filename
|
|
]));
|
|
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
|
|
curl_setopt($ch, CURLOPT_HTTPHEADER, [
|
|
'Content-Type: application/json'
|
|
]);
|
|
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
|
|
|
|
$response = curl_exec($ch);
|
|
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
|
|
curl_close($ch);
|
|
|
|
if ($httpCode !== 200) {
|
|
log_message('error', "Spooler API returned HTTP $httpCode");
|
|
throw new \Exception('Failed to queue PDF generation');
|
|
}
|
|
|
|
$data = json_decode($response, true);
|
|
return $data['jobId'];
|
|
}
|
|
|
|
public function checkPdfStatus($jobId) {
|
|
$ch = curl_init();
|
|
curl_setopt($ch, CURLOPT_URL, "http://localhost:3030/api/pdf/status/$jobId");
|
|
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
|
|
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
|
|
|
|
$response = curl_exec($ch);
|
|
curl_close($ch);
|
|
|
|
return $this->response->setJSON($response);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Frontend Example (JavaScript)
|
|
|
|
```javascript
|
|
async function generatePDF(accessNumber) {
|
|
try {
|
|
const response = await fetch('/report/generate/' + accessNumber, {
|
|
method: 'POST'
|
|
});
|
|
|
|
const { jobId, status } = await response.json();
|
|
|
|
if (status === 'queued') {
|
|
alert('PDF queued for generation');
|
|
}
|
|
|
|
return jobId;
|
|
} catch (error) {
|
|
console.error('Failed to generate PDF:', error);
|
|
alert('Failed to generate PDF');
|
|
}
|
|
}
|
|
|
|
async function pollPdfStatus(jobId) {
|
|
const maxAttempts = 60;
|
|
let attempts = 0;
|
|
|
|
const interval = setInterval(async () => {
|
|
if (attempts >= maxAttempts) {
|
|
clearInterval(interval);
|
|
alert('PDF generation timeout');
|
|
return;
|
|
}
|
|
|
|
const response = await fetch(`/report/status/${jobId}`);
|
|
const data = await response.json();
|
|
|
|
if (data.status === 'completed') {
|
|
clearInterval(interval);
|
|
window.location.href = data.pdfUrl;
|
|
} else if (data.status === 'error') {
|
|
clearInterval(interval);
|
|
alert('PDF generation failed: ' + data.error);
|
|
}
|
|
|
|
attempts++;
|
|
}, 2000);
|
|
}
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Chrome Crash Handling
|
|
|
|
1. Chrome crash detected (CDP connection lost or timeout)
|
|
2. Stop processing current jobs
|
|
3. Move queue jobs back to "queued" status
|
|
4. Attempt to restart Chrome (max 3 attempts)
|
|
5. Resume processing
|
|
|
|
### Failed Jobs
|
|
|
|
- Failed jobs logged to `data/error/{jobId}.json`
|
|
- Never auto-deleted (manual review required)
|
|
- Review `logs/errors.log` for details
|
|
- Error JSON contains full job details including error message
|
|
|
|
## Cleanup
|
|
|
|
### Manual Execution
|
|
|
|
```bash
|
|
# Test cleanup (dry-run)
|
|
npm run cleanup:dry-run
|
|
|
|
# Execute cleanup
|
|
npm run cleanup
|
|
```
|
|
|
|
### Retention Policy
|
|
|
|
| Directory | Retention | Action |
|
|
|-----------|-----------|---------|
|
|
| `data/pdfs/` | 7 days | Move to archive |
|
|
| `data/archive/YYYYMM/` | 45 days | Delete |
|
|
| `data/error/` | Manual | Never delete |
|
|
| `logs/` | 30 days | Delete (compress after 7 days) |
|
|
|
|
### Cleanup Tasks
|
|
|
|
1. Archive PDFs older than 7 days to `data/archive/YYYYMM/`
|
|
2. Delete archived PDFs older than 45 days
|
|
3. Compress log files older than 7 days
|
|
4. Delete log files older than 30 days
|
|
5. Check disk space (alert if > 80%)
|
|
|
|
## Monitoring
|
|
|
|
### Admin Dashboard
|
|
|
|
Open `admin.html` in browser for:
|
|
- Real-time queue statistics
|
|
- Processing metrics
|
|
- Error file list
|
|
- Disk space visualization
|
|
|
|
**URL:** `http://localhost/gdc_cmod/node_spooler/admin.html`
|
|
|
|
### Key Metrics
|
|
|
|
- Average PDF time: < 2 seconds
|
|
- Success rate: > 95%
|
|
- Queue size: < 100 jobs
|
|
- Disk usage: < 80%
|
|
|
|
### Log Files
|
|
|
|
- `logs/spooler.log` - All API events (info, warn, error)
|
|
- `logs/errors.log` - PDF generation errors only
|
|
- `logs/metrics.log` - Performance stats (per job)
|
|
- `logs/cleanup.log` - Cleanup execution logs
|
|
|
|
## Troubleshooting
|
|
|
|
### Spooler Not Starting
|
|
|
|
**Solutions:**
|
|
1. Check if Chrome is running on port 42020
|
|
2. Check logs: `logs/spooler.log`
|
|
3. Verify directories exist: `data/pdfs`, `data/archive`, `data/error`, `logs`
|
|
4. Check Node.js version: `node --version` (need 14+)
|
|
5. Verify dependencies installed: `npm install`
|
|
|
|
**Start Chrome manually:**
|
|
```bash
|
|
"C:/Program Files/Google/Chrome/Application/chrome.exe"
|
|
--headless
|
|
--disable-gpu
|
|
--remote-debugging-port=42020
|
|
```
|
|
|
|
### PDF Not Generated
|
|
|
|
**Solutions:**
|
|
1. Check job status via API: `GET /api/pdf/status/{jobId}`
|
|
2. Review error logs: `logs/errors.log`
|
|
3. Verify Chrome connection: Check logs for CDP connection errors
|
|
4. Check HTML content: Ensure valid HTML
|
|
|
|
### Queue Full
|
|
|
|
**Solutions:**
|
|
1. Wait for current jobs to complete
|
|
2. Check admin dashboard for queue size
|
|
3. Increase `maxQueueSize` in `spooler.js` (default: 100)
|
|
4. Check if jobs are stuck (processing too long)
|
|
|
|
### Chrome Crashes Repeatedly
|
|
|
|
**Solutions:**
|
|
1. Check system RAM (need minimum 2GB available)
|
|
2. Reduce `maxConcurrent` in `spooler.js` (default: 5)
|
|
3. Check for memory leaks in Chrome
|
|
4. Restart Chrome manually and monitor
|
|
5. Check system resources: Task Manager > Performance
|
|
|
|
### High Disk Usage
|
|
|
|
**Solutions:**
|
|
1. Run cleanup: `npm run cleanup`
|
|
2. Check `data/archive/` for old folders
|
|
3. Check `logs/` for old logs
|
|
4. Check `data/pdfs/` for large files
|
|
5. Consider reducing PDF retention time in `cleanup-config.json`
|
|
|
|
## Deployment
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# 1. Create directories
|
|
cd D:\data\www\gdc_cmod
|
|
mkdir -p node_spooler/logs node_spooler/data/pdfs node_spooler/data/archive node_spooler/data/error
|
|
|
|
# 2. Install dependencies
|
|
cd node_spooler
|
|
npm install
|
|
|
|
# 3. Start Chrome (if not running)
|
|
"C:/Program Files/Google/Chrome/Application/chrome.exe"
|
|
--headless
|
|
--disable-gpu
|
|
--remote-debugging-port=42020
|
|
|
|
# 4. Start spooler
|
|
npm start
|
|
|
|
# 5. Test API
|
|
curl -X POST http://localhost:3030/api/pdf/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"
|
|
|
|
# 6. Open admin dashboard
|
|
# http://localhost/gdc_cmod/node_spooler/admin.html
|
|
```
|
|
|
|
### Production Setup
|
|
|
|
1. Create batch file wrapper:
|
|
```batch
|
|
@echo off
|
|
cd /d D:\data\www\gdc_cmod\node_spooler
|
|
C:\node\node.exe spooler.js
|
|
```
|
|
|
|
2. Create Windows service:
|
|
```batch
|
|
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
|
|
sc start PDFSpooler
|
|
```
|
|
|
|
3. Create scheduled task for cleanup:
|
|
```batch
|
|
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
|
|
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
|
|
```
|
|
|
|
## Version History
|
|
|
|
- **2.0.0 (2025-02-03):** Migrated from file watching to HTTP API queue
|
|
- Removed file watching (chokidar)
|
|
- Added Express HTTP API
|
|
- Internal queue with max 5 concurrent
|
|
- Max 100 jobs in queue
|
|
- Job auto-cleanup after 60 minutes
|
|
- Enhanced error handling with Chrome restart
|
|
- Admin dashboard for monitoring
|
|
- Automated cleanup system
|
|
|
|
## License
|
|
|
|
Internal use only.
|