auto-scraper/Log.org

42 lines
1.4 KiB
Org Mode

* Setting up
After reading the following on https://hub.docker.com/r/browserless/chrome
#+begin_quote
Getting Chrome running well in docker is also a challenge as there's quiet a few packages you need in order to get Chrome running. Once that's done then there's still missing fonts, getting libraries to work with it, and having limitations on service reliability.
#+end_quote
Made me think twice about setting it up myself, so just grabbed this for now.
- I realized soon eough that ws://localhost:3000 is browserless' own API, so I went
and tried to figure out how to go about getting the websocket for the chrome
devtools, turns out I need to launch an instance first.
Browserless has an API but I went through the documentation and quickly felt
like it probably defeats the purpose of the exercise to use them, so I instead
used this;
https://hub.docker.com/r/zenika/alpine-chrome
Perhaps the exercise is looking for me to actually build an image from scratch,
but let's make progress on all other other tasks before tackling that.
Ok, so found this;
https://github.com/ultrafunkamsterdam/undetected-chromedriver/
This is how to pass brave to the URL
https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/806
I could set this up in the docker container, however, I'm not sure this is the
right thing.
I found this resource;
https://bot.incolumitas.com/#botChallenge
Ok, so it works! I was able to scrape google with the =driver.py= script!