Class | Bio::PhyloXML::Parser |
In: |
lib/bio/db/phyloxml/phyloxml_parser.rb
|
Parent: | Object |
Bio::PhyloXML::Parser is for parsing phyloXML format files.
Libxml2 XML parser is required. Install libxml-ruby bindings from libxml.rubyforge.org or
gem install -r libxml-ruby
require 'bio' # Create new phyloxml parser phyloxml = Bio::PhyloXML::Parser.open('example.xml') # Print the names of all trees in the file phyloxml.each do |tree| puts tree.name end
www.phyloxml.org/documentation/version_100/phyloxml.xsd.html
other | [R] | After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects |
Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.for_io($stdin)
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 171 171: def self.for_io(io, validate=true) 172: obj = new(nil, validate) 173: obj.instance_eval { 174: @reader = XML::Reader.io(io, 175: { :options => 176: LibXML::XML::Parser::Options::NONET }) 177: _skip_leader 178: } 179: obj 180: end
Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
str = File.read("./phyloxml_examples.xml") p = Bio::PhyloXML::Parser.new(str)
Deprecated usage: Reads data from a file. <em>str<em> is a filename.
p = Bio::PhyloXML::Parser.new("./phyloxml_examples.xml")
Taking filename is deprecated. Use Bio::PhyloXML::Parser.open(filename).
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 271 271: def initialize(str, validate=true) 272: 273: @other = [] 274: 275: return unless str 276: 277: # For compatibility, if filename-like string is given, 278: # treat it as a filename. 279: if /[\<\>\r\n]/ !~ str and File.exist?(str) then 280: # assume that str is filename 281: warn "Bio::PhyloXML::Parser.new(filename) is deprecated. Use Bio::PhyloXML::Parser.open(filename)." 282: filename = _secure_filename(str) 283: _validate(:file, filename) if validate 284: @reader = XML::Reader.file(filename) 285: _skip_leader 286: return 287: end 288: 289: # initialize for string 290: @reader = XML::Reader.string(str, 291: { :options => 292: LibXML::XML::Parser::Options::NONET }) 293: _skip_leader 294: end
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 90 90: def self.open(filename, validate=true) 91: obj = new(nil, validate) 92: obj.instance_eval { 93: filename = _secure_filename(filename) 94: _validate(:file, filename) if validate 95: # XML::Parser::Options::NONET for security reason 96: @reader = XML::Reader.file(filename, 97: { :options => 98: LibXML::XML::Parser::Options::NONET }) 99: _skip_leader 100: } 101: obj 102: end
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open_uri("http://www.phyloxml.org/examples/apaf.xml")
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 116 116: def self.open_uri(uri, validate=true) 117: case uri 118: when URI 119: uri = uri.to_s 120: else 121: # raises error if not a String 122: uri = uri.to_str 123: # raises error if invalid URI 124: URI.parse(uri) 125: end 126: 127: obj = new(nil, validate) 128: obj.instance_eval { 129: @reader = XML::Reader.file(uri) 130: _skip_leader 131: } 132: obj 133: end
Access the specified tree in the file. It parses trees until the specified tree is reached.
# Get 3rd tree in the file (starts counting from 0). parser = PhyloXML::Parser.open('phyloxml_examples.xml') tree = parser[2]
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 317 317: def [](i) 318: tree = nil 319: (i+1).times do 320: tree = self.next_tree 321: end 322: return tree 323: end
Closes the LibXML::Reader inside the object. It also closes the opened file if it is created by using Bio::PhyloXML::Parser.open method.
When closed object is closed again, or closed object is used, it raises LibXML::XML::Error.
Returns: | nil |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 153 153: def close 154: @reader.close 155: @reader = ClosedPhyloXMLParser.new 156: nil 157: end
Iterate through all trees in the file.
phyloxml = Bio::PhyloXML::Parser.open('example.xml') phyloxml.each do |tree| puts tree.name end
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 304 304: def each 305: while tree = next_tree 306: yield tree 307: end 308: end
Parse and return the next phylogeny tree. If there are no more phylogeny element, nil is returned. If there is something else besides phylogeny elements, it is saved in the PhyloXML::Parser#other.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml") tree = p.next_tree
Returns: | Bio::PhyloXML::Tree |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 334 334: def next_tree() 335: 336: if not is_element?('phylogeny') 337: if @reader.node_type == XML::Reader::TYPE_END_ELEMENT 338: if is_end_element?('phyloxml') 339: return nil 340: else 341: @reader.read 342: @reader.read 343: if is_end_element?('phyloxml') 344: return nil 345: end 346: end 347: end 348: # phyloxml can hold only phylogeny and "other" elements. If this is not 349: # phylogeny element then it is other. Also, "other" always comes after 350: # all phylogenies 351: @other << parse_other 352: #return nil for tree, since this is not valid phyloxml tree. 353: return nil 354: end 355: 356: tree = Bio::PhyloXML::Tree.new 357: 358: # keep track of current node in clades array/stack. Current node is the 359: # last element in the clades array 360: clades = [] 361: clades.push tree 362: 363: #keep track of current edge to be able to parse branch_length tag 364: current_edge = nil 365: 366: # we are going to parse clade iteratively by pointing (and changing) to 367: # the current node in the tree. Since the property element is both in 368: # clade and in the phylogeny, we need some boolean to know if we are 369: # parsing the clade (there can be only max 1 clade in phylogeny) or 370: # parsing phylogeny 371: parsing_clade = false 372: 373: while not is_end_element?('phylogeny') do 374: break if is_end_element?('phyloxml') 375: 376: # parse phylogeny elements, except clade 377: if not parsing_clade 378: 379: if is_element?('phylogeny') 380: @reader["rooted"] == "true" ? tree.rooted = true : tree.rooted = false 381: @reader["rerootable"] == "true" ? tree.rerootable = true : tree.rerootable = false 382: parse_attributes(tree, ["branch_length_unit", 'type']) 383: end 384: 385: parse_simple_elements(tree, [ "name", 'description', "date"]) 386: 387: if is_element?('confidence') 388: tree.confidences << parse_confidence 389: end 390: 391: end 392: 393: if @reader.node_type == XML::Reader::TYPE_ELEMENT 394: case @reader.name 395: when 'clade' 396: #parse clade element 397: 398: parsing_clade = true 399: 400: node= Bio::PhyloXML::Node.new 401: 402: branch_length = @reader['branch_length'] 403: 404: parse_attributes(node, ["id_source"]) 405: 406: #add new node to the tree 407: tree.add_node(node) 408: # The first clade will always be root since by xsd schema phyloxml can 409: # have 0 to 1 clades in it. 410: if tree.root == nil 411: tree.root = node 412: else 413: current_edge = tree.add_edge(clades[-1], node, 414: Bio::Tree::Edge.new(branch_length)) 415: end 416: clades.push node 417: #end if clade element 418: else 419: parse_clade_elements(clades[-1], current_edge) if parsing_clade 420: end 421: end 422: 423: #end clade element, go one parent up 424: if is_end_element?('clade') 425: 426: #if we have reached the closing tag of the top-most clade, then our 427: # curent node should point to the root, If thats the case, we are done 428: # parsing the clade element 429: if clades[-1] == tree.root 430: parsing_clade = false 431: else 432: # set current node (clades[-1) to the previous clade in the array 433: clades.pop 434: end 435: end 436: 437: #parsing phylogeny elements 438: if not parsing_clade 439: 440: if @reader.node_type == XML::Reader::TYPE_ELEMENT 441: case @reader.name 442: when 'property' 443: tree.properties << parse_property 444: 445: when 'clade_relation' 446: clade_relation = CladeRelation.new 447: parse_attributes(clade_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) 448: 449: #@ add unit test for this 450: if not @reader.empty_element? 451: @reader.read 452: if is_element?('confidence') 453: clade_relation.confidence = parse_confidence 454: end 455: end 456: tree.clade_relations << clade_relation 457: 458: when 'sequence_relation' 459: sequence_relation = SequenceRelation.new 460: parse_attributes(sequence_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) 461: if not @reader.empty_element? 462: @reader.read 463: if is_element?('confidence') 464: sequence_relation.confidence = parse_confidence 465: end 466: end 467: tree.sequence_relations << sequence_relation 468: when 'phylogeny' 469: #do nothing 470: else 471: tree.other << parse_other 472: #puts "Not recognized element. #{@reader.name}" 473: end 474: end 475: end 476: # go to next element 477: @reader.read 478: end #end while not </phylogeny> 479: #move on to the next tag after /phylogeny which is text, since phylogeny 480: #end tag is empty element, which value is nil, therefore need to move to 481: #the next meaningful element (therefore @reader.read twice) 482: @reader.read 483: @reader.read 484: 485: return tree 486: end