Web Scraping
Nội dung này hiện chưa có sẵn bằng ngôn ngữ của bạn.
Why Use Veilus for Scraping?
Section titled “Why Use Veilus for Scraping?”Traditional scraping tools (Puppeteer, Playwright, Selenium) are easily detected by anti-bot systems. Veilus solves this by providing:
- Real browser fingerprints — pass Canvas, WebGL, and AudioContext checks
- Residential proxy support — rotate IPs per request
- Profile persistence — maintain cookies across scraping sessions
- VeilusFlow — record scraping flows visually, no code needed
Quick Start: Scrape a Product Page
Section titled “Quick Start: Scrape a Product Page”Visual Method (VeilusFlow)
Section titled “Visual Method (VeilusFlow)”- Create a profile with a proxy
- Navigate to the target website
- Start VeilusFlow recording
- Click on the data you want to extract (price, title, rating)
- Right-click each element → Extract Text
- Stop recording
- Run the flow — data is saved to CSV
Code Method (Automation API)
Section titled “Code Method (Automation API)”For programmatic access, use Veilus’s local API:
// Connect to a running Veilus profileconst response = await fetch('http://localhost:9222/json/version');const { webSocketDebuggerUrl } = await response.json();
// Use Chrome DevTools Protocol (CDP)const browser = await puppeteer.connect({ browserWSEndpoint: webSocketDebuggerUrl});
const page = await browser.newPage();await page.goto('https://example.com/products');
// Extract dataconst products = await page.evaluate(() => { return [...document.querySelectorAll('.product')].map(el => ({ title: el.querySelector('.title')?.textContent, price: el.querySelector('.price')?.textContent, }));});Anti-Detection Best Practices
Section titled “Anti-Detection Best Practices”Rate Limiting
Section titled “Rate Limiting”Rule: Max 1 request per 3-5 secondsDon’t scrape faster than a human would browse. Use random delays:
- Page load: wait 2-5 seconds
- Between items: wait 1-3 seconds
- Between pages: wait 5-10 seconds
IP Rotation
Section titled “IP Rotation”- Use a different proxy for each scraping session
- Rotate IP after every 50-100 pages
- Use residential proxies for protected sites
Fingerprint Rotation
Section titled “Fingerprint Rotation”- Create a pool of 5-10 profiles with different fingerprints
- Rotate between profiles during long scraping sessions
- Each profile should have its own proxy
Session Management
Section titled “Session Management”- Save cookies and session data (profiles persist automatically)
- Reuse the same profile+proxy for subsequent visits to the same site
- This builds “reputation” with the target site
Handling CAPTCHAs
Section titled “Handling CAPTCHAs”When a CAPTCHA appears during scraping:
- VeilusFlow can pause and wait for manual solving
- Auto-detection — VeilusFlow detects common CAPTCHA patterns
- Third-party solvers — Integrate 2captcha or anti-captcha via API
Output Formats
Section titled “Output Formats”VeilusFlow extracted data can be exported as:
- CSV — For spreadsheets
- JSON — For programmatic processing
- Clipboard — Copy-paste individual values