Every time I needed to scrape or crawl a new site I'd end up writing the same boilerplate: requests for static pages, Selenium for JavaScript-heavy ones, BeautifulSoup for parsing. I made this so I don't have to keep redoing that every time. Import it, pass a format flag, done.
Python library for parsing web content over HTTP, with optional JavaScript rendering via Selenium.
git clone https://github.com/ramonclaudio/HTTParser.git
cd HTTParser
pip install -r requirements.txtOptional for JavaScript rendering:
pip install seleniumfrom httparser import HTTParser
r = HTTParser(url="https://httpbin.org/html", method="get", response_format="html")
print(r.response())from httparser import HTTParser
# GET
r = HTTParser(url="https://httpbin.org/json", method="get", response_format="json")
# POST
r = HTTParser(
url="https://httpbin.org/anything",
method="post",
response_format="json",
payload={"key": "value"},
)
print(r.response())from httparser import HTTParser
r = HTTParser(
url="https://httpbin.org/delay/3",
method="get",
response_format="js",
browser_path="/path/to/browser",
chromedriver_path="/path/to/chromedriver",
)
print(r.response())| Parameter | Required | Format |
|---|---|---|
url |
yes | string |
method |
yes | "get" or "post" |
response_format |
yes | "html", "json", or "js" |
headers |
no | {"header": "value"} |
params |
no | {"param": "value"} |
payload |
no | {"key": "value"} (POST only) |
browser_path |
no | path to any Chromium-based browser binary (js only) |
chromedriver_path |
no | path to ChromeDriver (js only) |
Download ChromeDriver at https://chromedriver.chromium.org/downloads matching your browser version. Works with Chrome, Chromium, Edge, Brave, Arc, Dia, Vivaldi, Opera, Helium, and other Chromium-based browsers.
Errors log to Error.log.
MIT