Object
Some things can not be carried out during evaluation - for example the ensure_presence_of_pattern constraint (since the evaluation is top to bottom, at a given point we don't know yet whether the currently evaluated pattern will have a child pattern or not) or removing unneeded results caused by evaluating multiple filters.
The sole purpose of this class is to execute these post-processing tasks.
This is just a convenience method do call all the postprocessing functionality and checks
# File lib/scrubyt/output/post_processor.rb, line 18 def self.apply_post_processing(root_pattern) ensure_presence_of_pattern_full(root_pattern) remove_multiple_filter_duplicates(root_pattern) if root_pattern.children[0].filters.size > 1 report_if_no_results(root_pattern) if root_pattern.evaluation_context.extractor.get_mode != :production end
Apply the ensure_presence_of_pattern constraint on the full extractor
# File lib/scrubyt/output/post_processor.rb, line 27 def self.ensure_presence_of_pattern_full(pattern) ensure_presence_of_pattern(pattern) pattern.children.each {|child| ensure_presence_of_pattern_full(child)} end
Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded
# File lib/scrubyt/output/post_processor.rb, line 37 def self.remove_multiple_filter_duplicates(pattern) remove_multiple_filter_duplicates_intern(pattern) if pattern.parent_of_leaf pattern.children.each {|child| remove_multiple_filter_duplicates(child)} end
Issue an error report if the document did not extract anything. Probably this is because the structure of the page changed or because of some rather nasty bug - in any case, something wrong is going on, and we need to inform the user about this!
# File lib/scrubyt/output/post_processor.rb, line 47 def self.report_if_no_results(root_pattern) results_found = false root_pattern.children.each {|child| return if (child.result.childmap.size > 0)} Scrubyt.log :WARNING, [ "The extractor did not find any result instances. Most probably this is wrong.", "Check your extractor and if you are sure it should work, report a bug!" ] end
Generated with the Darkfish Rdoc Generator 2.