Open Source Browserless Web Scraping API with Human-like Behavior
๐ฏ Unified Solution: Website + API on a single domain
๐ง Human-like Behavior: 40+ anti-detection techniques
๐ Deploy Anywhere: Docker, Node.js+PM2, or Development
- ๐ Unified Architecture: Website and API on one domain
- ๐ง Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
- ๐ Multiple Formats: HTML, text, screenshots, PDFs
- โก Batch Processing: Handle multiple URLs efficiently
- ๐ Production Ready: Docker, PM2, Nginx, SSL support
- ๐ก๏ธ Anti-Detection: 40+ stealth techniques for reliable scraping
# 1. Clone and configure git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX # Quick setup (makes scripts executable + creates .env) chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh # Then edit: nano .env # Update DOMAIN, SUBDOMAIN, and AUTH_TOKENChoose your deployment:
| Method | Command | Best For |
|---|---|---|
| ๐ณ Docker | docker-compose up -d | Production, easy deployment |
| ๐ง Auto Setup | chmod +x scripts/setup.sh && sudo ./scripts/setup.sh | VPS/Server with full control |
| ๐ป Development | npm install && npm start | Local development, testing |
Access your HeadlessX:
๐ Website: https://your-subdomain.yourdomain.com ๐ง Health: https://your-subdomain.yourdomain.com/api/health ๐ Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.
- ๐ง Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
- ๐ Better Performance: Optimized browser management and resource usage
- ๐ ๏ธ Developer Experience: Clear module boundaries and dependency injection
- ๐ฆ Production Ready: Enhanced error handling and logging with correlation IDs
- ๐ Security: Improved authentication and rate limiting
- ๐ Monitoring: Structured logging and health monitoring
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Routes โโโโโถโ Controllers โโโโโถโ Services โ โ (api.js) โ โ (rendering.js)โ โ (browser.js) โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ โ โผ โผ โผ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Middleware โ โ Utils โ โ Config โ โ (auth.js) โ โ (logger.js) โ โ (index.js) โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ Quick Migration from v1.1.0:
- The original
src/server.js(3079 lines) has been broken down into 20+ focused modules - Environment variable
TOKENis nowAUTH_TOKEN - PM2 config moved from
config/ecosystem.config.jstoecosystem.config.js - All functionality preserved with improved performance and maintainability
๐ Detailed Documentation: MODULAR_ARCHITECTURE.md
# Install Docker (if needed) curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER# Deploy HeadlessX git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN# Start services docker-compose up -d # Optional: Setup SSL sudo apt install certbot sudo certbot --standalone -d your-subdomain.yourdomain.comDocker Management:
docker-compose ps # Check status docker-compose logs headlessx # View logs docker-compose restart # Restart services docker-compose down # Stop services# Automated setup (recommended) git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure environment chmod +x scripts/setup.sh sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2๐ Nginx Configuration (Auto-handled by setup script):
The setup script automatically configures nginx, but if you need to manually configure:
# Copy and configure nginx site sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace placeholders with your actual domain sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Enable the site sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo rm -f /etc/nginx/sites-enabled/default # Test and reload nginx sudo nginx -t && sudo systemctl reload nginxManual setup (if not using setup script):
sudo apt update && sudo apt upgrade -y curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs build-essential npm install && npm run build sudo npm install -g pm2 npm run pm2:startPM2 Management:
npm run pm2:status # Check status npm run pm2:logs # View logs npm run pm2:restart # Restart server npm run pm2:stop # Stop servergit clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx# Make scripts executable chmod +x scripts/*.sh # Install dependencies npm install cd website && npm install && npm run build &&cd .. # Start development server npm start # Access at http://localhost:3000HeadlessX Routes: โโโ /favicon.ico โ Favicon โโโ /robots.txt โ SEO robots file โโโ /api/health โ Health check (no auth required) โโโ /api/status โ Server status (requires token) โโโ /api/render โ Full page rendering โโโ /api/html โ HTML extraction โโโ /api/content โ Clean text extraction โโโ /api/screenshot โ Screenshot generation โโโ /api/pdf โ PDF generation โโโ /api/batch โ Batch URL processing ๐ Request Flow:
- Nginx receives request on port 80/443
- Proxies to Node.js server on port 3000
- Server routes based on path:
/api/*โ API endpoints/*โ Website files (built Next.js app)
curl https://your-subdomain.yourdomain.com/api/healthcurl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "timeout": 30000}'curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \ -o screenshot.pngcurl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "waitForSelector": "main"}'curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "format": "A4"}' \ -o document.pdfHTTP Request Module Configuration:
{"url": "https://your-subdomain.yourdomain.com/api/html", "method": "POST", "headers":{"Content-Type": "application/json" }, "qs":{"token": "YOUR_AUTH_TOKEN" }, "body":{"url": "{{url_to_scrape}}", "timeout": 30000, "waitForSelector": "{{optional_selector}}" } }Webhooks by Zapier Setup:
- URL:
https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN - Method: POST
- Headers:
Content-Type: application/json - Body:
{"url": "{{url_from_trigger}}", "timeout": 30000, "humanBehavior": true }HTTP Request Node:
{"url": "https://your-subdomain.yourdomain.com/api/html", "method": "POST", "authentication": "queryAuth", "query":{"token": "YOUR_AUTH_TOKEN" }, "headers":{"Content-Type": "application/json" }, "body":{"url": "={{$json.url}}", "timeout": 30000, "humanBehavior": true } }Available via n8n Community Node:
- Install:
npm install n8n-nodes-headlessx - GitHub Repository
importrequestsdefscrape_with_headlessx(url, token): response=requests.post( "https://your-subdomain.yourdomain.com/api/html", params={"token": token}, json={"url": url, "timeout": 30000, "humanBehavior": True } ) returnresponse.json() # Usageresult=scrape_with_headlessx("https://example.com", "YOUR_TOKEN") print(result['html'])constaxios=require('axios');asyncfunctionscrapeWithHeadlessX(url,token){try{constresponse=awaitaxios.post(`https://your-subdomain.yourdomain.com/api/html?token=${token}`,{url: url,timeout: 30000,humanBehavior: true});returnresponse.data;}catch(error){console.error('Scraping failed:',error.message);throwerror;}}// UsagescrapeWithHeadlessX('https://example.com','YOUR_TOKEN').then(result=>console.log(result.html)).catch(error=>console.error(error));curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example1.com", "https://example2.com", "https://example3.com" ], "timeout": 30000, "humanBehavior": true }'curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://example.com", "https://httpbin.org"], "format": "text", "options":{"timeout": 30000} }'HeadlessX v1.2.0 - Modular Architecture/ โโโ ๐ src/ # Modular application source โ โโโ ๐ config/ # Configuration management โ โ โโโ index.js # Main configuration loader โ โ โโโ browser.js # Browser-specific settings โ โโโ ๐ utils/ # Utility functions โ โ โโโ errors.js # Error handling & categorization โ โ โโโ logger.js # Structured logging โ โ โโโ helpers.js # Common utilities โ โโโ ๐ services/ # Business logic services โ โ โโโ browser.js # Browser lifecycle management โ โ โโโ stealth.js # Anti-detection techniques โ โ โโโ interaction.js # Human-like behavior โ โ โโโ rendering.js # Core rendering logic โ โโโ ๐ middleware/ # Express middleware โ โ โโโ auth.js # Authentication โ โ โโโ error.js # Error handling โ โโโ ๐ controllers/ # Request handlers โ โ โโโ system.js # Health & status endpoints โ โ โโโ rendering.js # Main rendering endpoints โ โ โโโ batch.js # Batch processing โ โ โโโ get.js # GET endpoints & docs โ โโโ ๐ routes/ # Route definitions โ โ โโโ api.js # API route mappings โ โ โโโ static.js # Static file serving โ โโโ app.js # Main application setup โ โโโ server.js # Entry point for PM2 โ โโโ rate-limiter.js # Rate limiting implementation โโโ ๐ website/ # Next.js website (unchanged) โ โโโ app/ # Next.js 13+ app directory โ โโโ components/ # React components โ โโโ .env.example # Website environment template โ โโโ next.config.js # Next.js configuration โ โโโ package.json # Website dependencies โโโ ๐ scripts/ # Deployment & management scripts โ โโโ setup.sh # Automated installation (updated) โ โโโ update_server.sh # Server update script (updated) โ โโโ verify-domain.sh # Domain verification โ โโโ test-routing.sh # Integration testing โโโ ๐ nginx/ # Nginx configuration โ โโโ headlessx.conf # Nginx proxy config โโโ ๐ docker/ # Docker deployment (updated) โ โโโ Dockerfile # Container definition โ โโโ docker-compose.yml # Docker Compose setup โโโ ecosystem.config.js # PM2 configuration (moved to root) โโโ .env.example # Environment template (updated) โโโ package.json # Server dependencies (updated) โโโ MODULAR_ARCHITECTURE.md # Architecture documentation โโโ README.md # This file # 1. Install dependencies npm install # 2. Build websitecd website npm install npm run build cd .. # 3. Set environment variablesexport AUTH_TOKEN="development_token_123"export DOMAIN="localhost"export SUBDOMAIN="headlessx"# 4. Start server npm start # Uses src/app.js# 5. Access locally# Website: http://localhost:3000# API: http://localhost:3000/api/health# Test server and website integration bash scripts/test-routing.sh localhost # Test with environment variables bash scripts/verify-domain.shCreate your .env file from the template:
cp .env.example .env nano .envRequired configuration:
# Security Token (Generate a secure random string) AUTH_TOKEN=your_secure_token_here # Domain Configuration DOMAIN=yourdomain.com SUBDOMAIN=headlessx # Optional: Browser Settings BROWSER_TIMEOUT=60000 MAX_CONCURRENT_BROWSERS=5 # Optional: Server Settings PORT=3000 NODE_ENV=productionOption 1: Automatic (Recommended)
# The setup script automatically replaces domain placeholders sudo ./scripts/setup.shOption 2: Manual Configuration
# Copy nginx configuration sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace domain placeholders (replace with your actual domain) sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Example: If your domain is "api.example.com" sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx # Enable site and reload nginx sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo nginx -t && sudo systemctl reload nginxYour final URLs will be:
- Website:
https://your-subdomain.yourdomain.com - API Health:
https://your-subdomain.yourdomain.com/api/health - API Endpoints:
https://your-subdomain.yourdomain.com/api/*
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/api/health | GET | Health check | โ |
/api/status | GET | Server status | โ |
/api/render | POST | Full page rendering (JSON) | โ |
/api/html | GET/POST | Raw HTML extraction | โ |
/api/content | GET/POST | Clean text extraction | โ |
/api/screenshot | GET | Screenshot generation | โ |
/api/pdf | GET | PDF generation | โ |
/api/batch | POST | Batch URL processing | โ |
All endpoints (except /api/health) require a token via:
- Query parameter:
?token=YOUR_TOKEN - Header:
X-Token: YOUR_TOKEN - Header:
Authorization: Bearer YOUR_TOKEN
Visit your HeadlessX website for full API documentation with examples, or check:
curl https://your-subdomain.yourdomain.com/api/health curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"# PM2 logs npm run pm2:logs pm2 logs headlessx --lines 100 # Docker logs docker-compose logs -f headlessx # Nginx logs sudo tail -f /var/log/nginx/access.loggit pull origin main npm run build # Rebuild website npm run pm2:restart # PM2# OR docker-compose restart # Docker"npm ci" Error (missing package-lock.json):
chmod +x scripts/generate-lockfiles.sh ./scripts/generate-lockfiles.sh # Generate lock files# OR npm install --production # Use install instead"Cannot find module 'express'":
npm install # Install dependenciesSystem dependency errors (Ubuntu):
sudo apt update && sudo apt install -y \ libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \ libatspi2.0-0t64 libasound2t64 libxcomposite1PM2 not starting:
sudo npm install -g pm2 chmod +x scripts/setup.sh # Make script executable pm2 start config/ecosystem.config.js pm2 logs headlessx # Check errorsScript permission errors:
# Make all scripts executable chmod +x scripts/*.sh # Or use the quick setup chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.shPlaywright browser installation errors:
# Use dedicated Playwright setup script chmod +x scripts/setup-playwright.sh ./scripts/setup-playwright.sh # Or install manually: sudo apt update && sudo apt install -y \ libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \ libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \ libasound2t64 libatk1.0-0t64 libnss3 # Install only Chromium (most stable) npx playwright install chromium # Alternative: Use Docker (avoids dependency issues) docker-compose up -d- Token Authentication: Secure API access with custom tokens
- Rate Limiting: Nginx-level request throttling
- Security Headers: XSS, CSRF, and clickjacking protection
- Bot Protection: Common attack vector blocking
- SSL/TLS: Automatic HTTPS with Let's Encrypt
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ๐ Documentation: Visit your deployed website for full API docs
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.
Made with โค๏ธ for the developer community.