Module | Scraper::Reader |
In: |
lib/scraper/reader.rb
|
REDIRECT_LIMIT | = | 3 |
DEFAULT_TIMEOUT | = | 30 |
PARSERS | = | [:tidy, :html_parser] |
TIDY_OPTIONS | = | { :output_xhtml=>true, :show_errors=>0, :show_warnings=>false, :wrap=>0, :wrap_sections=>false, :force_output=>true, :quiet=>true, :tidy_mark=>false |
Page | = | Struct.new(:url, :content, :encoding, :last_modified, :etag) |
Parsed | = | Struct.new(:document, :encoding) |
Parses an HTML page and returns the encoding and HTML element. Raises HTMLParseError exceptions if it cannot parse the HTML.
Options are passed to the parser. For example, when using Tidy you can pass Tidy cleanup options in the hash.
The last option specifies which parser to use (see PARSERS). By default Tidy is used.
Reads a Web page and return its URL, content and cache control headers.
The request reads a Web page at the specified URL (must be a URI object). It accepts the following options:
It returns a hash with the following information:
If the page has not been modified from the last request, the content is nil.
Raises HTTPError if an error prevents it from reading the page.