auto-scraper/Task.org

* Task

Build a Docker image that boots a minimal browser (Chromium, Firefox, Safari, or Edge all work). Then write a small script that uses the image to scrape the following URL:

https://www.google.com/search?q=MINISFORUM+MS-A2

Requirements:

- Accept optional proxy URL and optional browser launch flags

* Estimate and report:

- Cold start time
- Total transfer size (bandwidth over the wire)
- Time to response
- CPU and memory usage

- Save final HTML output to a file
- Use any language you're comfortable with
- We can provide a proxy URL, or you can use your own

* Goal:

 Optimize for:

- Low latency
- Minimal bandwidth
- High success rate (avoid bans, captchas, etc.)

Then:

 Write a short design doc (max 4 pages) outlining how you'd scale this to 10k concurrent requests. No need to detail measurement tooling just focus on next steps to evolve this into a full browser farm. Include:

- Fingerprinting and TLS shaping
- Crash recovery
- Session pooling and management
- Scaling and orchestration model
- Anti-bot defenses
- Unknowns and how you'd tackle them

We want to see how you'd approach this independently and steer the project forward. You don’t need to know everything, but the plan should be grounded and reasonable.

Time cap: 1–2 days max. Let us know if that sounds fair or if you'd prefer to tweak anything. We’re flexible, just aiming for something valuable and time-bounded.