Object
Makes all entry types look for the passed in element to parse. This is actually just a call to element (a SAXMachine call) in the class
Valid keys are same as with SAXMachine
# File lib/feedzirra/feed.rb, line 58 def self.add_common_feed_entry_element(element_tag, options = {}) # need to think of a better way to do this. will break for people who want this behavior # across their added classes feed_classes.map{|k| eval("#{k}Entry") }.each do |klass| klass.send(:element, element_tag, options) end end
Adds a new feed parsing class that will be used for parsing.
The class/constant that you want to register.
A updated array of feed parser class names.
# File lib/feedzirra/feed.rb, line 40 def self.add_feed_class(klass) feed_classes.unshift klass end
An abstraction for adding a feed by a Feed object to the passed Curb::multi stack.
The Curl::Multi object that the request should be added too.
A feed object that you would like to be fetched.
An array of feed objects that are queued for request.
Existing responses that you want the response from the request added to.
or <Array> A single feed object, or an array of feed objects.
Valid keys for this argument as as followed:
:user_agent - String that overrides the default user agent.
:on_success - Block that gets executed after a successful request.
:on_failure - Block that gets executed after a failed request.
The updated Curl::Multi object with the request details added to it's stack.
# File lib/feedzirra/feed.rb, line 261 def self.add_feed_to_multi(multi, feed, feed_queue, responses, options) easy = Curl::Easy.new(feed.feed_url) do |curl| curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT) curl.headers["If-Modified-Since"] = feed.last_modified.httpdate if feed.last_modified curl.headers["If-None-Match"] = feed.etag if feed.etag curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication) curl.follow_location = true curl.max_redirects = options[:max_redirects] if options[:max_redirects] curl.timeout = options[:timeout] if options[:timeout] curl.on_success do |c| begin add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty? updated_feed = Feed.parse(c.body_str) updated_feed.feed_url = c.last_effective_url updated_feed.etag = etag_from_header(c.header_str) updated_feed.last_modified = last_modified_from_header(c.header_str) feed.update_from_feed(updated_feed) responses[feed.feed_url] = feed options[:on_success].call(feed) if options.has_key?(:on_success) rescue Exception => e options[:on_failure].call(feed, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure) end end curl.on_failure do |c, err| add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty? response_code = c.response_code if response_code == 304 # it's not modified. this isn't an error condition responses[feed.feed_url] = feed options[:on_success].call(feed) if options.has_key?(:on_success) else responses[feed.url] = c.response_code options[:on_failure].call(feed, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure) end end end multi.add(easy) end
An abstraction for adding a feed by URL to the passed Curb::multi stack.
The Curl::Multi object that the request should be added too.
The URL of the feed that you would like to be fetched.
An array of URLs that are queued for request.
Existing responses that you want the response from the request added to.
A single feed object, or an array of feed objects.
Valid keys for this argument as as followed:
:user_agent - String that overrides the default user agent.
:on_success - Block that gets executed after a successful request.
:on_failure - Block that gets executed after a failed request.
The updated Curl::Multi object with the request details added to it's stack.
# File lib/feedzirra/feed.rb, line 203 def self.add_url_to_multi(multi, url, url_queue, responses, options) easy = Curl::Easy.new(url) do |curl| curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT) curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since) curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match) curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress) curl.follow_location = true curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication) curl.max_redirects = options[:max_redirects] if options[:max_redirects] curl.timeout = options[:timeout] if options[:timeout] curl.on_success do |c| add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty? xml = decode_content(c) klass = determine_feed_parser_for_xml(xml) if klass begin feed = klass.parse(xml) feed.feed_url = c.last_effective_url feed.etag = etag_from_header(c.header_str) feed.last_modified = last_modified_from_header(c.header_str) responses[url] = feed options[:on_success].call(url, feed) if options.has_key?(:on_success) rescue Exception => e options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure) end else # puts "Error determining parser for #{url} - #{c.last_effective_url}" # raise NoParserAvailable.new("no valid parser for content.") (this would unfirtunately fail the whole 'multi', so it's not really useable) options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure) end end curl.on_failure do |c, err| add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty? responses[url] = c.response_code options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure) end end multi.add(easy) end
Decodes the XML document if it was compressed.
The Curl::Easy response object from the request.
A decoded string of XML.
# File lib/feedzirra/feed.rb, line 145 def self.decode_content(c) if c.header_str.match(/Content-Encoding: gzip/) begin gz = Zlib::GzipReader.new(StringIO.new(c.body_str)) xml = gz.read gz.close rescue Zlib::GzipFile::Error # Maybe this is not gzipped? xml = c.body_str end elsif c.header_str.match(/Content-Encoding: deflate/) xml = Zlib::Inflate.inflate(c.body_str) else xml = c.body_str end xml end
Determines the correct parser class to use for parsing the feed.
The XML that you would like determine the parser for.
The class name of the parser that can handle the XML.
# File lib/feedzirra/feed.rb, line 29 def self.determine_feed_parser_for_xml(xml) start_of_doc = xml.slice(0, 2000) feed_classes.detect {|klass| klass.able_to_parse?(start_of_doc)} end
Determines the etag from the request headers.
Raw request header returned from the request
A string of the etag or nil if it cannot be found in the headers.
# File lib/feedzirra/feed.rb, line 308 def self.etag_from_header(header) header =~ /.*ETag:\s(.*)\r/ $1 end
Provides a list of registered feed parsing classes.
A array of class names.
# File lib/feedzirra/feed.rb, line 48 def self.feed_classes @feed_classes ||= [Feedzirra::Parser::RSS, Feedzirra::Parser::AtomFeedBurner, Feedzirra::Parser::Atom] end
Fetches and returns the parsed XML for each URL provided.
A single feed URL, or an array of feed URLs.
Valid keys for this argument as as followed:
:user_agent - String that overrides the default user agent.
:if_modified_since - Time object representing when the feed was last updated.
:if_none_match - String, an etag for the request that was stored previously.
:on_success - Block that gets executed after a successful request.
:on_failure - Block that gets executed after a failed request.
A Feed object if a single URL is passed.
A Hash if multiple URL's are passed. The key will be the URL, and the value the Feed object.
# File lib/feedzirra/feed.rb, line 124 def self.fetch_and_parse(urls, options = {}) url_queue = [*urls] multi = Curl::Multi.new responses = {} # I broke these down so I would only try to do 30 simultaneously because # I was getting weird errors when doing a lot. As one finishes it pops another off the queue. url_queue.slice!(0, 30).each do |url| add_url_to_multi(multi, url, url_queue, responses, options) end multi.perform return urls.is_a?(String) ? responses.values.first : responses end
Fetches and returns the raw XML for each URL provided.
A single feed URL, or an array of feed URLs.
Valid keys for this argument as as followed: :user_agent - String that overrides the default user agent. :if_modified_since - Time object representing when the feed was last updated. :if_none_match - String that's normally an etag for the request that was stored previously. :on_success - Block that gets executed after a successful request. :on_failure - Block that gets executed after a failed request.
A String of XML if a single URL is passed.
A Hash if multiple URL's are passed. The key will be the URL, and the value the XML.
# File lib/feedzirra/feed.rb, line 80 def self.fetch_raw(urls, options = {}) url_queue = [*urls] multi = Curl::Multi.new responses = {} url_queue.each do |url| easy = Curl::Easy.new(url) do |curl| curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT) curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since) curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match) curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress) curl.follow_location = true curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication) curl.max_redirects = options[:max_redirects] if options[:max_redirects] curl.timeout = options[:timeout] if options[:timeout] curl.on_success do |c| responses[url] = decode_content(c) end curl.on_failure do |c, err| responses[url] = c.response_code end end multi.add(easy) end multi.perform urls.is_a?(String) ? responses.values.first : responses end
Determines the last modified date from the request headers.
Raw request header returned from the request
A Time object of the last modified date or nil if it cannot be found in the headers.
# File lib/feedzirra/feed.rb, line 319 def self.last_modified_from_header(header) header =~ /.*Last-Modified:\s(.*)\r/ Time.parse($1) if $1 end
Takes a raw XML feed and attempts to parse it. If no parser is available a Feedzirra::NoParserAvailable exception is raised.
The XML that you would like parsed.
An instance of the determined feed type. By default a Feedzirra::Atom, Feedzirra::AtomFeedBurner, Feedzirra::RDF, or Feedzirra::RSS object.
Feedzirra::NoParserAvailable : If no valid parser classes could be found for the feed.
# File lib/feedzirra/feed.rb, line 15 def self.parse(xml) if parser = determine_feed_parser_for_xml(xml) parser.parse(xml) else raise NoParserAvailable.new("No valid parser for XML.") end end
Updates each feed for each Feed object provided.
A single feed object, or an array of feed objects.
Valid keys for this argument as as followed:
:user_agent - String that overrides the default user agent.
:on_success - Block that gets executed after a successful request.
:on_failure - Block that gets executed after a failed request.
A updated Feed object if a single URL is passed.
A Hash if multiple Feeds are passed. The key will be the URL, and the value the updated Feed object.
# File lib/feedzirra/feed.rb, line 176 def self.update(feeds, options = {}) feed_queue = [*feeds] multi = Curl::Multi.new responses = {} feed_queue.slice!(0, 30).each do |feed| add_feed_to_multi(multi, feed, feed_queue, responses, options) end multi.perform responses.size == 1 ? responses.values.first : responses.values end
Generated with the Darkfish Rdoc Generator 2.