44 lines
1.4 KiB
Org Mode
44 lines
1.4 KiB
Org Mode
* Task
|
||
|
||
Build a Docker image that boots a minimal browser (Chromium, Firefox, Safari, or Edge all work). Then write a small script that uses the image to scrape the following URL:
|
||
|
||
https://www.google.com/search?q=MINISFORUM+MS-A2
|
||
|
||
Requirements:
|
||
|
||
- Accept optional proxy URL and optional browser launch flags
|
||
|
||
* Estimate and report:
|
||
|
||
- Cold start time
|
||
- Total transfer size (bandwidth over the wire)
|
||
- Time to response
|
||
- CPU and memory usage
|
||
|
||
- Save final HTML output to a file
|
||
- Use any language you're comfortable with
|
||
- We can provide a proxy URL, or you can use your own
|
||
|
||
* Goal:
|
||
|
||
Optimize for:
|
||
|
||
- Low latency
|
||
- Minimal bandwidth
|
||
- High success rate (avoid bans, captchas, etc.)
|
||
|
||
Then:
|
||
|
||
Write a short design doc (max 4 pages) outlining how you'd scale this to 10k concurrent requests. No need to detail measurement tooling just focus on next steps to evolve this into a full browser farm. Include:
|
||
|
||
- Fingerprinting and TLS shaping
|
||
- Crash recovery
|
||
- Session pooling and management
|
||
- Scaling and orchestration model
|
||
- Anti-bot defenses
|
||
- Unknowns and how you'd tackle them
|
||
|
||
We want to see how you'd approach this independently and steer the project forward. You don’t need to know everything, but the plan should be grounded and reasonable.
|
||
|
||
Time cap: 1–2 days max. Let us know if that sounds fair or if you'd prefer to tweak anything. We’re flexible, just aiming for something valuable and time-bounded.
|