Skip to content

ramonclaudio/HTTParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTTParser

Every time I needed to scrape or crawl a new site I'd end up writing the same boilerplate: requests for static pages, Selenium for JavaScript-heavy ones, BeautifulSoup for parsing. I made this so I don't have to keep redoing that every time. Import it, pass a format flag, done.

Python library for parsing web content over HTTP, with optional JavaScript rendering via Selenium.

Install

git clone https://github.com/ramonclaudio/HTTParser.git
cd HTTParser
pip install -r requirements.txt

Optional for JavaScript rendering:

pip install selenium

Usage

HTML

from httparser import HTTParser

r = HTTParser(url="https://httpbin.org/html", method="get", response_format="html")
print(r.response())

JSON

from httparser import HTTParser

# GET
r = HTTParser(url="https://httpbin.org/json", method="get", response_format="json")

# POST
r = HTTParser(
    url="https://httpbin.org/anything",
    method="post",
    response_format="json",
    payload={"key": "value"},
)
print(r.response())

JavaScript (dynamic)

from httparser import HTTParser

r = HTTParser(
    url="https://httpbin.org/delay/3",
    method="get",
    response_format="js",
    browser_path="/path/to/browser",
    chromedriver_path="/path/to/chromedriver",
)
print(r.response())

Parameters

Parameter Required Format
url yes string
method yes "get" or "post"
response_format yes "html", "json", or "js"
headers no {"header": "value"}
params no {"param": "value"}
payload no {"key": "value"} (POST only)
browser_path no path to any Chromium-based browser binary (js only)
chromedriver_path no path to ChromeDriver (js only)

JavaScript rendering setup

Download ChromeDriver at https://chromedriver.chromium.org/downloads matching your browser version. Works with Chrome, Chromium, Edge, Brave, Arc, Dia, Vivaldi, Opera, Helium, and other Chromium-based browsers.

Errors

Errors log to Error.log.

License

MIT

About

Python library for parsing web content over HTTP, with optional JavaScript rendering via Selenium.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages