Remove CI4 integration examples to focus on service documentation Changes: - Remove CI4 Integration section (113 lines of PHP/JS examples) - Remove CI4 Controller from Architecture diagram - Remove all ReportController, curl, fetch code references - Condense Quick Start and Troubleshooting sections - Focus README on pure node_spooler service documentation Reduction: 429 → 308 lines (-121 lines, 28% smaller) Scope: - All API endpoints documented - Error handling and cleanup procedures preserved - Monitoring and troubleshooting guides retained - Deployment instructions maintained - No CI4 integration code examples
309 lines
6.6 KiB
Markdown
309 lines
6.6 KiB
Markdown
# PDF Spooler v2.0
|
|
|
|
Bismillahirohmanirohim.
|
|
|
|
## Overview
|
|
|
|
Node.js Express service with internal queue for HTML to PDF conversion using Chrome DevTools Protocol.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Client Application
|
|
↓ POST {html, filename}
|
|
Node.js Spooler (port 3030)
|
|
↓ queue
|
|
Internal Queue (max 5 concurrent)
|
|
↓ process
|
|
PDF Generator (Chrome CDP port 42020)
|
|
↓ save
|
|
data/pdfs/{filename}.pdf
|
|
```
|
|
|
|
## Features
|
|
|
|
- HTTP API for PDF generation (no file watching)
|
|
- Internal queue with max 5 concurrent processing
|
|
- Max 100 jobs in queue
|
|
- In-memory job tracking (auto-cleanup after 60 min)
|
|
- Chrome crash detection & restart (max 3 attempts)
|
|
- Comprehensive logging (info, error, metrics)
|
|
- Automated cleanup with dry-run mode
|
|
- Admin dashboard for monitoring
|
|
- Manual error review required (see `data/error/`)
|
|
|
|
## API Endpoints
|
|
|
|
### POST /api/pdf/generate
|
|
|
|
Generate PDF from HTML content.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"html": "<html>...</html>",
|
|
"filename": "1234567890.pdf"
|
|
}
|
|
```
|
|
|
|
**Response (Success):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "queued",
|
|
"message": "Job added to queue"
|
|
}
|
|
```
|
|
|
|
**Response (Error):**
|
|
```json
|
|
{
|
|
"success": false,
|
|
"error": "Queue is full, please try again later"
|
|
}
|
|
```
|
|
|
|
### GET /api/pdf/status/:jobId
|
|
|
|
Check job status.
|
|
|
|
**Response (Queued/Processing):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "queued|processing",
|
|
"progress": 0|50,
|
|
"pdfUrl": null,
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
**Response (Completed):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "completed",
|
|
"progress": 100,
|
|
"pdfUrl": "/node_spooler/data/pdfs/1234567890.pdf",
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
**Response (Error):**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"jobId": "job_1738603845123_abc123xyz",
|
|
"status": "error",
|
|
"progress": 0,
|
|
"pdfUrl": null,
|
|
"error": "Chrome timeout"
|
|
}
|
|
```
|
|
|
|
### GET /api/queue/stats
|
|
|
|
Queue statistics.
|
|
|
|
```json
|
|
{
|
|
"success": true,
|
|
"queueSize": 12,
|
|
"processing": 3,
|
|
"completed": 45,
|
|
"errors": 2,
|
|
"avgProcessingTime": 0.82,
|
|
"maxQueueSize": 100
|
|
}
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Chrome Crash Handling
|
|
|
|
1. Chrome crash detected (CDP connection lost or timeout)
|
|
2. Stop processing current jobs
|
|
3. Move queue jobs back to "queued" status
|
|
4. Attempt to restart Chrome (max 3 attempts)
|
|
5. Resume processing
|
|
|
|
### Failed Jobs
|
|
|
|
- Failed jobs logged to `data/error/{jobId}.json`
|
|
- Never auto-deleted (manual review required)
|
|
- Review `logs/errors.log` for details
|
|
- Error JSON contains full job details including error message
|
|
|
|
## Cleanup
|
|
|
|
### Manual Execution
|
|
|
|
```bash
|
|
# Test cleanup (dry-run)
|
|
npm run cleanup:dry-run
|
|
|
|
# Execute cleanup
|
|
npm run cleanup
|
|
```
|
|
|
|
### Retention Policy
|
|
|
|
| Directory | Retention | Action |
|
|
|-----------|-----------|---------|
|
|
| `data/pdfs/` | 7 days | Move to archive |
|
|
| `data/archive/YYYYMM/` | 45 days | Delete |
|
|
| `data/error/` | Manual | Never delete |
|
|
| `logs/` | 30 days | Delete (compress after 7 days) |
|
|
|
|
### Cleanup Tasks
|
|
|
|
1. Archive PDFs older than 7 days to `data/archive/YYYYMM/`
|
|
2. Delete archived PDFs older than 45 days
|
|
3. Compress log files older than 7 days
|
|
4. Delete log files older than 30 days
|
|
5. Check disk space (alert if > 80%)
|
|
|
|
## Monitoring
|
|
|
|
### Admin Dashboard
|
|
|
|
Open `admin.html` in browser for:
|
|
- Real-time queue statistics
|
|
- Processing metrics
|
|
- Error file list
|
|
- Disk space visualization
|
|
|
|
**URL:** `http://localhost:3030/admin.html`
|
|
|
|
### Key Metrics
|
|
|
|
- Average PDF time: < 2 seconds
|
|
- Success rate: > 95%
|
|
- Queue size: < 100 jobs
|
|
- Disk usage: < 80%
|
|
|
|
### Log Files
|
|
|
|
- `logs/spooler.log` - All API events (info, warn, error)
|
|
- `logs/errors.log` - PDF generation errors only
|
|
- `logs/metrics.log` - Performance stats (per job)
|
|
- `logs/cleanup.log` - Cleanup execution logs
|
|
|
|
## Troubleshooting
|
|
|
|
### Spooler Not Starting
|
|
|
|
- Check if Chrome is running on port 42020
|
|
- Check logs: `logs/spooler.log`
|
|
- Verify directories exist: `data/pdfs`, `data/archive`, `data/error`, `logs`
|
|
- Check Node.js version: `node --version` (need 14+)
|
|
- Verify dependencies installed: `npm install`
|
|
|
|
**Start Chrome manually:**
|
|
```bash
|
|
"C:/Program Files/Google/Chrome/Application/chrome.exe"
|
|
--headless
|
|
--disable-gpu
|
|
--remote-debugging-port=42020
|
|
```
|
|
|
|
### PDF Not Generated
|
|
|
|
- Check job status via API: `GET /api/pdf/status/{jobId}`
|
|
- Review error logs: `logs/errors.log`
|
|
- Verify Chrome connection: Check logs for CDP connection errors
|
|
- Check HTML content: Ensure valid HTML
|
|
|
|
### Queue Full
|
|
|
|
- Wait for current jobs to complete
|
|
- Check admin dashboard for queue size
|
|
- Increase `maxQueueSize` in `spooler.js` (default: 100)
|
|
- Check if jobs are stuck (processing too long)
|
|
|
|
### Chrome Crashes Repeatedly
|
|
|
|
- Check system RAM (need minimum 2GB available)
|
|
- Reduce `maxConcurrent` in `spooler.js` (default: 5)
|
|
- Check for memory leaks in Chrome
|
|
- Restart Chrome manually and monitor
|
|
- Check system resources: Task Manager > Performance
|
|
|
|
### High Disk Usage
|
|
|
|
- Run cleanup: `npm run cleanup`
|
|
- Check `data/archive/` for old folders
|
|
- Check `logs/` for old logs
|
|
- Check `data/pdfs/` for large files
|
|
- Consider reducing PDF retention time in `cleanup-config.json`
|
|
|
|
## Deployment
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# 1. Create directories
|
|
cd node_spooler
|
|
mkdir -p logs data/pdfs data/archive data/error
|
|
|
|
# 2. Install dependencies
|
|
npm install
|
|
|
|
# 3. Start Chrome (if not running)
|
|
"C:/Program Files/Google/Chrome/Application/chrome.exe"
|
|
--headless
|
|
--disable-gpu
|
|
--remote-debugging-port=42020
|
|
|
|
# 4. Start spooler
|
|
npm start
|
|
|
|
# 5. Test API
|
|
curl -X POST http://localhost:3030/api/pdf/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"html\":\"<html><body>Test</body></html>\",\"filename\":\"test.pdf\"}"
|
|
|
|
# 6. Open admin dashboard
|
|
# http://localhost:3030/admin.html
|
|
```
|
|
|
|
### Production Setup
|
|
|
|
**1. Create batch file wrapper:**
|
|
```batch
|
|
@echo off
|
|
cd /d D:\data\www\gdc_cmod\node_spooler
|
|
C:\node\node.exe spooler.js
|
|
```
|
|
|
|
**2. Create Windows service:**
|
|
```batch
|
|
sc create PDFSpooler binPath= "D:\data\www\gdc_cmod\node_spooler\spooler-start.bat" start=auto
|
|
sc start PDFSpooler
|
|
```
|
|
|
|
**3. Create scheduled task for cleanup:**
|
|
```batch
|
|
schtasks /create /tn "PDF Cleanup Daily" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js" /sc daily /st 01:00
|
|
schtasks /create /tn "PDF Cleanup Weekly" /tr "C:\node\node.exe D:\data\www\gdc_cmod\node_spooler\cleanup.js weekly" /sc weekly /d MON /st 01:00
|
|
```
|
|
|
|
## Version History
|
|
|
|
- **2.0.0 (2025-02-03):** Migrated from file watching to HTTP API queue
|
|
- Removed file watching (chokidar)
|
|
- Added Express HTTP API
|
|
- Internal queue with max 5 concurrent
|
|
- Max 100 jobs in queue
|
|
- Job auto-cleanup after 60 minutes
|
|
- Enhanced error handling with Chrome restart
|
|
- Admin dashboard for monitoring
|
|
- Automated cleanup system
|
|
|
|
## License
|
|
|
|
Internal use only.
|