Web Scraping

Nội dung này hiện chưa có sẵn bằng ngôn ngữ của bạn.

Why Use Veilus for Scraping?

Traditional scraping tools (Puppeteer, Playwright, Selenium) are easily detected by anti-bot systems. Veilus solves this by providing:

Real browser fingerprints — pass Canvas, WebGL, and AudioContext checks
Residential proxy support — rotate IPs per request
Profile persistence — maintain cookies across scraping sessions
VeilusFlow — record scraping flows visually, no code needed

Quick Start: Scrape a Product Page

Visual Method (VeilusFlow)

Create a profile with a proxy
Navigate to the target website
Start VeilusFlow recording
Click on the data you want to extract (price, title, rating)
Right-click each element → Extract Text
Stop recording
Run the flow — data is saved to CSV

Code Method (Automation API)

For programmatic access, use Veilus’s local API:

// Connect to a running Veilus profile
const response = await fetch('http://localhost:9222/json/version');
const { webSocketDebuggerUrl } = await response.json();

// Use Chrome DevTools Protocol (CDP)
const browser = await puppeteer.connect({
  browserWSEndpoint: webSocketDebuggerUrl
});

const page = await browser.newPage();
await page.goto('https://example.com/products');

// Extract data
const products = await page.evaluate(() => {
  return [...document.querySelectorAll('.product')].map(el => ({
    title: el.querySelector('.title')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }));
});

Anti-Detection Best Practices

Rate Limiting

Rule: Max 1 request per 3-5 seconds

Don’t scrape faster than a human would browse. Use random delays:

Page load: wait 2-5 seconds
Between items: wait 1-3 seconds
Between pages: wait 5-10 seconds

IP Rotation

Use a different proxy for each scraping session
Rotate IP after every 50-100 pages
Use residential proxies for protected sites

Fingerprint Rotation

Create a pool of 5-10 profiles with different fingerprints
Rotate between profiles during long scraping sessions
Each profile should have its own proxy

Session Management

Save cookies and session data (profiles persist automatically)
Reuse the same profile+proxy for subsequent visits to the same site
This builds “reputation” with the target site

Handling CAPTCHAs

When a CAPTCHA appears during scraping:

VeilusFlow can pause and wait for manual solving
Auto-detection — VeilusFlow detects common CAPTCHA patterns
Third-party solvers — Integrate 2captcha or anti-captcha via API

Output Formats

VeilusFlow extracted data can be exported as:

CSV — For spreadsheets
JSON — For programmatic processing
Clipboard — Copy-paste individual values