Skip to content

A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.

License

Notifications You must be signed in to change notification settings

rubyco/HeadlessX

Repository files navigation

๐Ÿš€ HeadlessX v1.2.0

Open Source Browserless Web Scraping API with Human-like Behavior

License: MITNode.jsPlaywrightGitHubOpen Source

๐ŸŽฏ Unified Solution: Website + API on a single domain
๐Ÿง  Human-like Behavior: 40+ anti-detection techniques
๐Ÿš€ Deploy Anywhere: Docker, Node.js+PM2, or Development


โœจ Key Features

  • ๐ŸŒ Unified Architecture: Website and API on one domain
  • ๐Ÿง  Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
  • ๐Ÿ“Š Multiple Formats: HTML, text, screenshots, PDFs
  • โšก Batch Processing: Handle multiple URLs efficiently
  • ๐Ÿ”’ Production Ready: Docker, PM2, Nginx, SSL support
  • ๐Ÿ›ก๏ธ Anti-Detection: 40+ stealth techniques for reliable scraping

๐ŸŽฏ Quick Start

# 1. Clone and configure git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX # Quick setup (makes scripts executable + creates .env) chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh # Then edit: nano .env # Update DOMAIN, SUBDOMAIN, and AUTH_TOKEN

Choose your deployment:

MethodCommandBest For
๐Ÿณ Dockerdocker-compose up -dProduction, easy deployment
๐Ÿ”ง Auto Setupchmod +x scripts/setup.sh && sudo ./scripts/setup.shVPS/Server with full control
๐Ÿ’ป Developmentnpm install && npm startLocal development, testing

Access your HeadlessX:

๐ŸŒ Website: https://your-subdomain.yourdomain.com ๐Ÿ”ง Health: https://your-subdomain.yourdomain.com/api/health ๐Ÿ“Š Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN 

๐Ÿ—๏ธ New Modular Architecture v1.2.0

HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.

Key Improvements:

  • ๐Ÿ”ง Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
  • ๐Ÿš€ Better Performance: Optimized browser management and resource usage
  • ๐Ÿ› ๏ธ Developer Experience: Clear module boundaries and dependency injection
  • ๐Ÿ“ฆ Production Ready: Enhanced error handling and logging with correlation IDs
  • ๐Ÿ”’ Security: Improved authentication and rate limiting
  • ๐Ÿ“Š Monitoring: Structured logging and health monitoring

Architecture Overview:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Routes โ”‚โ”€โ”€โ”€โ–ถโ”‚ Controllers โ”‚โ”€โ”€โ”€โ–ถโ”‚ Services โ”‚ โ”‚ (api.js) โ”‚ โ”‚ (rendering.js)โ”‚ โ”‚ (browser.js) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Middleware โ”‚ โ”‚ Utils โ”‚ โ”‚ Config โ”‚ โ”‚ (auth.js) โ”‚ โ”‚ (logger.js) โ”‚ โ”‚ (index.js) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 

Quick Migration from v1.1.0:

  • The original src/server.js (3079 lines) has been broken down into 20+ focused modules
  • Environment variable TOKEN is now AUTH_TOKEN
  • PM2 config moved from config/ecosystem.config.js to ecosystem.config.js
  • All functionality preserved with improved performance and maintainability

๐Ÿ“– Detailed Documentation: MODULAR_ARCHITECTURE.md


๐Ÿš€ Deployment Guide

๐Ÿณ Docker Deployment (Recommended)

# Install Docker (if needed) curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER# Deploy HeadlessX git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN# Start services docker-compose up -d # Optional: Setup SSL sudo apt install certbot sudo certbot --standalone -d your-subdomain.yourdomain.com

Docker Management:

docker-compose ps # Check status docker-compose logs headlessx # View logs docker-compose restart # Restart services docker-compose down # Stop services

๐Ÿ”ง Node.js + PM2 Deployment

# Automated setup (recommended) git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure environment chmod +x scripts/setup.sh sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2

๐ŸŒ Nginx Configuration (Auto-handled by setup script):

The setup script automatically configures nginx, but if you need to manually configure:

# Copy and configure nginx site sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace placeholders with your actual domain sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Enable the site sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo rm -f /etc/nginx/sites-enabled/default # Test and reload nginx sudo nginx -t && sudo systemctl reload nginx

Manual setup (if not using setup script):

sudo apt update && sudo apt upgrade -y curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs build-essential npm install && npm run build sudo npm install -g pm2 npm run pm2:start

PM2 Management:

npm run pm2:status # Check status npm run pm2:logs # View logs npm run pm2:restart # Restart server npm run pm2:stop # Stop server

๐Ÿ’ป Development Setup

git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx# Make scripts executable chmod +x scripts/*.sh # Install dependencies npm install cd website && npm install && npm run build &&cd .. # Start development server npm start # Access at http://localhost:3000

๐ŸŒ API Routes & Structure

HeadlessX Routes: โ”œโ”€โ”€ /favicon.ico โ†’ Favicon โ”œโ”€โ”€ /robots.txt โ†’ SEO robots file โ”œโ”€โ”€ /api/health โ†’ Health check (no auth required) โ”œโ”€โ”€ /api/status โ†’ Server status (requires token) โ”œโ”€โ”€ /api/render โ†’ Full page rendering โ”œโ”€โ”€ /api/html โ†’ HTML extraction โ”œโ”€โ”€ /api/content โ†’ Clean text extraction โ”œโ”€โ”€ /api/screenshot โ†’ Screenshot generation โ”œโ”€โ”€ /api/pdf โ†’ PDF generation โ””โ”€โ”€ /api/batch โ†’ Batch URL processing 

๐Ÿ”„ Request Flow:

  1. Nginx receives request on port 80/443
  2. Proxies to Node.js server on port 3000
  3. Server routes based on path:
    • /api/* โ†’ API endpoints
    • /* โ†’ Website files (built Next.js app)

๐Ÿš€ API Examples & HTTP Integrations

Quick Health Check (No Auth)

curl https://your-subdomain.yourdomain.com/api/health

๐Ÿ”ง cURL Examples

Extract HTML Content

curl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "timeout": 30000}'

Generate Screenshot

curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \ -o screenshot.png

Extract Text Only

curl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "waitForSelector": "main"}'

Generate PDF

curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "format": "A4"}' \ -o document.pdf

๐Ÿค– Make.com (Integromat) Integration

HTTP Request Module Configuration:

{"url": "https://your-subdomain.yourdomain.com/api/html", "method": "POST", "headers":{"Content-Type": "application/json" }, "qs":{"token": "YOUR_AUTH_TOKEN" }, "body":{"url": "{{url_to_scrape}}", "timeout": 30000, "waitForSelector": "{{optional_selector}}" } }

โšก Zapier Integration

Webhooks by Zapier Setup:

  • URL:https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN
  • Method: POST
  • Headers:Content-Type: application/json
  • Body:
{"url": "{{url_from_trigger}}", "timeout": 30000, "humanBehavior": true }

๐Ÿ”— n8n Integration

HTTP Request Node:

{"url": "https://your-subdomain.yourdomain.com/api/html", "method": "POST", "authentication": "queryAuth", "query":{"token": "YOUR_AUTH_TOKEN" }, "headers":{"Content-Type": "application/json" }, "body":{"url": "={{$json.url}}", "timeout": 30000, "humanBehavior": true } }

Available via n8n Community Node:

๐Ÿ Python Example

importrequestsdefscrape_with_headlessx(url, token): response=requests.post( "https://your-subdomain.yourdomain.com/api/html", params={"token": token}, json={"url": url, "timeout": 30000, "humanBehavior": True } ) returnresponse.json() # Usageresult=scrape_with_headlessx("https://example.com", "YOUR_TOKEN") print(result['html'])

๐ŸŸจ JavaScript/Node.js Example

constaxios=require('axios');asyncfunctionscrapeWithHeadlessX(url,token){try{constresponse=awaitaxios.post(`https://your-subdomain.yourdomain.com/api/html?token=${token}`,{url: url,timeout: 30000,humanBehavior: true});returnresponse.data;}catch(error){console.error('Scraping failed:',error.message);throwerror;}}// UsagescrapeWithHeadlessX('https://example.com','YOUR_TOKEN').then(result=>console.log(result.html)).catch(error=>console.error(error));

๐Ÿ”„ Batch Processing Example

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example1.com", "https://example2.com", "https://example3.com" ], "timeout": 30000, "humanBehavior": true }'

Batch Processing

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://example.com", "https://httpbin.org"], "format": "text", "options":{"timeout": 30000} }'

๐Ÿ“ Project Structure

HeadlessX v1.2.0 - Modular Architecture/ โ”œโ”€โ”€ ๐Ÿ“‚ src/ # Modular application source โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ config/ # Configuration management โ”‚ โ”‚ โ”œโ”€โ”€ index.js # Main configuration loader โ”‚ โ”‚ โ””โ”€โ”€ browser.js # Browser-specific settings โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ utils/ # Utility functions โ”‚ โ”‚ โ”œโ”€โ”€ errors.js # Error handling & categorization โ”‚ โ”‚ โ”œโ”€โ”€ logger.js # Structured logging โ”‚ โ”‚ โ””โ”€โ”€ helpers.js # Common utilities โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ services/ # Business logic services โ”‚ โ”‚ โ”œโ”€โ”€ browser.js # Browser lifecycle management โ”‚ โ”‚ โ”œโ”€โ”€ stealth.js # Anti-detection techniques โ”‚ โ”‚ โ”œโ”€โ”€ interaction.js # Human-like behavior โ”‚ โ”‚ โ””โ”€โ”€ rendering.js # Core rendering logic โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ middleware/ # Express middleware โ”‚ โ”‚ โ”œโ”€โ”€ auth.js # Authentication โ”‚ โ”‚ โ””โ”€โ”€ error.js # Error handling โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ controllers/ # Request handlers โ”‚ โ”‚ โ”œโ”€โ”€ system.js # Health & status endpoints โ”‚ โ”‚ โ”œโ”€โ”€ rendering.js # Main rendering endpoints โ”‚ โ”‚ โ”œโ”€โ”€ batch.js # Batch processing โ”‚ โ”‚ โ””โ”€โ”€ get.js # GET endpoints & docs โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ routes/ # Route definitions โ”‚ โ”‚ โ”œโ”€โ”€ api.js # API route mappings โ”‚ โ”‚ โ””โ”€โ”€ static.js # Static file serving โ”‚ โ”œโ”€โ”€ app.js # Main application setup โ”‚ โ”œโ”€โ”€ server.js # Entry point for PM2 โ”‚ โ””โ”€โ”€ rate-limiter.js # Rate limiting implementation โ”œโ”€โ”€ ๐Ÿ“‚ website/ # Next.js website (unchanged) โ”‚ โ”œโ”€โ”€ app/ # Next.js 13+ app directory โ”‚ โ”œโ”€โ”€ components/ # React components โ”‚ โ”œโ”€โ”€ .env.example # Website environment template โ”‚ โ”œโ”€โ”€ next.config.js # Next.js configuration โ”‚ โ””โ”€โ”€ package.json # Website dependencies โ”œโ”€โ”€ ๐Ÿ“‚ scripts/ # Deployment & management scripts โ”‚ โ”œโ”€โ”€ setup.sh # Automated installation (updated) โ”‚ โ”œโ”€โ”€ update_server.sh # Server update script (updated) โ”‚ โ”œโ”€โ”€ verify-domain.sh # Domain verification โ”‚ โ””โ”€โ”€ test-routing.sh # Integration testing โ”œโ”€โ”€ ๐Ÿ“‚ nginx/ # Nginx configuration โ”‚ โ””โ”€โ”€ headlessx.conf # Nginx proxy config โ”œโ”€โ”€ ๐Ÿ“‚ docker/ # Docker deployment (updated) โ”‚ โ”œโ”€โ”€ Dockerfile # Container definition โ”‚ โ””โ”€โ”€ docker-compose.yml # Docker Compose setup โ”œโ”€โ”€ ecosystem.config.js # PM2 configuration (moved to root) โ”œโ”€โ”€ .env.example # Environment template (updated) โ”œโ”€โ”€ package.json # Server dependencies (updated) โ”œโ”€โ”€ MODULAR_ARCHITECTURE.md # Architecture documentation โ””โ”€โ”€ README.md # This file 

๐Ÿ› ๏ธ Development

Local Development

# 1. Install dependencies npm install # 2. Build websitecd website npm install npm run build cd .. # 3. Set environment variablesexport AUTH_TOKEN="development_token_123"export DOMAIN="localhost"export SUBDOMAIN="headlessx"# 4. Start server npm start # Uses src/app.js# 5. Access locally# Website: http://localhost:3000# API: http://localhost:3000/api/health

Testing Integration

# Test server and website integration bash scripts/test-routing.sh localhost # Test with environment variables bash scripts/verify-domain.sh

โš™๏ธ Configuration

๐ŸŒ Environment Variables (.env)

Create your .env file from the template:

cp .env.example .env nano .env

Required configuration:

# Security Token (Generate a secure random string) AUTH_TOKEN=your_secure_token_here # Domain Configuration  DOMAIN=yourdomain.com SUBDOMAIN=headlessx # Optional: Browser Settings BROWSER_TIMEOUT=60000 MAX_CONCURRENT_BROWSERS=5 # Optional: Server Settings PORT=3000 NODE_ENV=production

๐ŸŒ Nginx Domain Setup

Option 1: Automatic (Recommended)

# The setup script automatically replaces domain placeholders sudo ./scripts/setup.sh

Option 2: Manual Configuration

# Copy nginx configuration sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace domain placeholders (replace with your actual domain) sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Example: If your domain is "api.example.com" sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx # Enable site and reload nginx sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo nginx -t && sudo systemctl reload nginx

Your final URLs will be:

  • Website: https://your-subdomain.yourdomain.com
  • API Health: https://your-subdomain.yourdomain.com/api/health
  • API Endpoints: https://your-subdomain.yourdomain.com/api/*

๐Ÿ“Š API Reference

๐Ÿ”ง Core Endpoints

EndpointMethodDescriptionAuth Required
/api/healthGETHealth checkโŒ
/api/statusGETServer statusโœ…
/api/renderPOSTFull page rendering (JSON)โœ…
/api/htmlGET/POSTRaw HTML extractionโœ…
/api/contentGET/POSTClean text extractionโœ…
/api/screenshotGETScreenshot generationโœ…
/api/pdfGETPDF generationโœ…
/api/batchPOSTBatch URL processingโœ…

๐Ÿ”‘ Authentication

All endpoints (except /api/health) require a token via:

  • Query parameter: ?token=YOUR_TOKEN
  • Header: X-Token: YOUR_TOKEN
  • Header: Authorization: Bearer YOUR_TOKEN

๐Ÿ“– Complete Documentation

Visit your HeadlessX website for full API documentation with examples, or check:


๐Ÿ“Š Monitoring & Troubleshooting

๐Ÿ” Health Checks

curl https://your-subdomain.yourdomain.com/api/health curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"

๐Ÿ“‹ Log Management

# PM2 logs npm run pm2:logs pm2 logs headlessx --lines 100 # Docker logs docker-compose logs -f headlessx # Nginx logs sudo tail -f /var/log/nginx/access.log

๐Ÿ”„ Updates

git pull origin main npm run build # Rebuild website npm run pm2:restart # PM2# OR docker-compose restart # Docker

๐Ÿ”ง Common Issues

"npm ci" Error (missing package-lock.json):

chmod +x scripts/generate-lockfiles.sh ./scripts/generate-lockfiles.sh # Generate lock files# OR npm install --production # Use install instead

"Cannot find module 'express'":

npm install # Install dependencies

System dependency errors (Ubuntu):

sudo apt update && sudo apt install -y \ libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \ libatspi2.0-0t64 libasound2t64 libxcomposite1

PM2 not starting:

sudo npm install -g pm2 chmod +x scripts/setup.sh # Make script executable pm2 start config/ecosystem.config.js pm2 logs headlessx # Check errors

Script permission errors:

# Make all scripts executable chmod +x scripts/*.sh # Or use the quick setup chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh

Playwright browser installation errors:

# Use dedicated Playwright setup script chmod +x scripts/setup-playwright.sh ./scripts/setup-playwright.sh # Or install manually: sudo apt update && sudo apt install -y \ libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \ libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \ libasound2t64 libatk1.0-0t64 libnss3 # Install only Chromium (most stable) npx playwright install chromium # Alternative: Use Docker (avoids dependency issues) docker-compose up -d

๐Ÿ” Security Features

  • Token Authentication: Secure API access with custom tokens
  • Rate Limiting: Nginx-level request throttling
  • Security Headers: XSS, CSRF, and clickjacking protection
  • Bot Protection: Common attack vector blocking
  • SSL/TLS: Automatic HTTPS with Let's Encrypt

๐Ÿค Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ†˜ Support


๐ŸŽฏ Built by SaifyXPRO

HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.

Made with โค๏ธ for the developer community.

About

A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript60.9%
  • TypeScript20.0%
  • Shell16.9%
  • CSS1.6%
  • Dockerfile0.6%