Class | Bio::Tree |
In: |
lib/bio/db/newick.rb
lib/bio/tree.rb |
Parent: | Object |
This is the class for phylogenetic tree. It stores a phylogenetic tree.
Internally, it is based on Bio::Pathway class. However, users cannot handle Bio::Pathway object directly.
This is alpha version. Incompatible changes may be made frequently.
DEFAULT_OPTIONS | = | { :indent => ' ' } | default options |
options | [RW] | tree options; mainly used for tree output |
root | [RW] | root node of this tree (even if unrooted tree, it is used by some methods) |
Creates a new phylogenetic tree. When no arguments are given, it creates a new empty tree. When a Tree object is given, it copies the tree. Note that the new tree shares Node and Edge objects with the given tree.
# File lib/bio/tree.rb, line 259 259: def initialize(tree = nil) 260: # creates an undirected adjacency list graph 261: @pathway = Bio::Pathway.new([], true) 262: @root = nil 263: @options = {} 264: _init_cache 265: self.concat(tree) if tree 266: end
Adds a new edge to the tree. Returns the newly added edge. If the edge already exists, it is overwritten with new one.
# File lib/bio/tree.rb, line 381 381: def add_edge(source, target, edge = Edge.new) 382: _clear_cache 383: @pathway.append(Bio::Relation.new(source, target, edge)) 384: edge 385: end
Adds a node to the tree. Returns self. If the node already exists, it does nothing.
# File lib/bio/tree.rb, line 403 403: def add_node(node) 404: _clear_cache 405: @pathway.graph[node] ||= {} 406: self 407: end
Shows the adjacency matrix representation of the tree. It shows matrix only for given nodes. If nodes is nil or is ommitted, it acts the same as tree.adjacency_matrix(tree.nodes). If a block is given, for each edge, it yields source, target, and edge, and uses the returned value of the block. Without blocks, it uses edge. Returns a matrix object.
# File lib/bio/tree.rb, line 823 823: def adjacency_matrix(nodes = nil, 824: default_value = nil, 825: diagonal_value = nil) #:yields: source, target, edge 826: nodes ||= self.nodes 827: size = nodes.size 828: hash = {} 829: nodes.each_with_index { |x, i| hash[x] = i } 830: # prepares an matrix 831: matrix = Array.new(size, nil) 832: matrix.collect! { |x| Array.new(size, default_value) } 833: (0...size).each { |i| matrix[i][i] = diagonal_value } 834: # fills the matrix from each edge 835: self.each_edge do |source, target, edge| 836: i_source = hash[source] 837: i_target = hash[target] 838: if i_source and i_target then 839: val = block_given? ? (yield source, target, edge) : edge 840: matrix[i_source][i_target] = val 841: matrix[i_target][i_source] = val 842: end 843: end 844: Matrix.rows(matrix, false) 845: end
Gets all ancestral nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 758 758: def ancestors(node, root = nil) 759: root ||= @root 760: (self.path(root, node) - [ node ]).reverse 761: end
Gets the adjacent children nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 702 702: def children(node, root = nil) 703: root ||= @root 704: c = self.adjacent_nodes(node) 705: c.delete(self.parent(node, root)) 706: c 707: end
Removes all edges connected with the node. Returns self. If the node does not exist, raises IndexError.
# File lib/bio/tree.rb, line 418 418: def clear_node(node) 419: unless self.include?(node) 420: raise IndexError, 'the node does not exist' 421: end 422: _clear_cache 423: @pathway.relations.delete_if do |rel| 424: rel.node.include?(node) 425: end 426: @pathway.graph[node].each_key do |k| 427: @pathway.graph[k].delete(node) 428: end 429: @pathway.graph[node].clear 430: self 431: end
Replaces each edge by each block‘s return value. Returns self.
# File lib/bio/tree.rb, line 528 528: def collect_edge! #:yields: source, target, edge 529: _clear_cache 530: @pathway.relations.each do |rel| 531: newedge = yield rel.node[0], rel.node[1], rel.relation 532: rel.edge = newedge 533: @pathway.append(rel, false) 534: end 535: self 536: end
Replaces each node by each block‘s return value. Returns self.
# File lib/bio/tree.rb, line 507 507: def collect_node! #:yields: node 508: _clear_cache 509: tr = {} 510: self.each_node do |node| 511: tr[node] = yield node 512: end 513: # replaces nodes in @pathway.relations 514: @pathway.relations.each do |rel| 515: rel.node.collect! { |node| tr[node] } 516: end 517: # re-generates @pathway from relations 518: @pathway.to_list 519: # adds orphan nodes 520: tr.each_value do |newnode| 521: @pathway.graph[newnode] ||= {} 522: end 523: self 524: end
Concatenates the other tree. If the same edge exists, the edge in other is used. Returns self. The result is unspecified if other isn‘t a Tree object. Note that the Node and Edge objects in the other tree are shared in the concatinated tree.
# File lib/bio/tree.rb, line 596 596: def concat(other) 597: #raise TypeError unless other.kind_of?(self.class) 598: _clear_cache 599: other.each_node do |node| 600: self.add_node(node) 601: end 602: other.each_edge do |node1, node2, edge| 603: self.add_edge(node1, node2, edge) 604: end 605: self 606: end
Gets all descendent nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 713 713: def descendents(node, root = nil) 714: root ||= @root 715: distance, route = @pathway.breadth_first_search(root) 716: d = distance[node] 717: result = [] 718: distance.each do |key, val| 719: if val > d then 720: x = key 721: while x = route[x] 722: if x == node then 723: result << key 724: break 725: end 726: break if distance[x] <= d 727: end 728: end 729: end 730: result 731: end
Returns distance between node1 and node2. It would raise error if the edges didn‘t contain distance values. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 641 641: def distance(node1, node2) 642: distance = 0 643: self.each_edge_in_path(node1, node2) do |source, target, edge| 644: distance += get_edge_distance(edge) 645: end 646: distance 647: end
Calculates distance matrix of given nodes. If nodes is nil, or is ommited, it acts the same as tree.distance_matrix(tree.leaves). Returns a matrix object. The result is unspecified for cyclic trees. Note 1: The diagonal values of the matrix are 0. Note 2: If the distance cannot be calculated, nil will be set.
# File lib/bio/tree.rb, line 794 794: def distance_matrix(nodes = nil) 795: nodes ||= self.leaves 796: matrix = [] 797: nodes.each_index do |i| 798: row = [] 799: nodes.each_index do |j| 800: if i == j then 801: distance = 0 802: elsif r = matrix[j] and val = r[i] then 803: distance = val 804: else 805: distance = (self.distance(nodes[i], nodes[j]) rescue nil) 806: end 807: row << distance 808: end 809: matrix << row 810: end 811: Matrix.rows(matrix, false) 812: end
Iterates over each edge from node1 to node2. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 627 627: def each_edge_in_path(node1, node2) 628: path = self.path(node1, node2) 629: source = path.shift 630: path.each do |target| 631: edge = self.get_edge(source, target) 632: yield source, target, edge 633: source = target 634: end 635: self 636: end
Iterates over each node of this tree.
# File lib/bio/tree.rb, line 306 306: def each_node(&x) #:yields: node 307: @pathway.graph.each_key(&x) 308: self 309: end
Iterates over each connected edges of the given node. Returns self.
The reason why the method name is "each_out_edge" is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 356 356: def each_out_edge(source) #:yields: source, target, edge 357: h = @pathway.graph[source] 358: h.each { |key, val| yield source, key, val } if h 359: self 360: end
Returns edge1 + edge2
# File lib/bio/tree.rb, line 122 122: def get_edge_merged(edge1, edge2) 123: dist1 = get_edge_distance(edge1) 124: dist2 = get_edge_distance(edge2) 125: if dist1 and dist2 then 126: Edge.new(dist1 + dist2) 127: elsif dist1 then 128: Edge.new(dist1) 129: elsif dist2 then 130: Edge.new(dist2) 131: else 132: Edge.new 133: end 134: end
# File lib/bio/tree.rb, line 238 238: def get_node_bootstrap(node) 239: begin 240: node.bootstrap 241: rescue NoMethodError 242: nil 243: end 244: end
# File lib/bio/tree.rb, line 246 246: def get_node_bootstrap_string(node) 247: begin 248: node.bootstrap_string 249: rescue NoMethodError 250: nil 251: end 252: end
Finds a node in the tree by given name and returns the node. If the node does not found, returns nil. If multiple nodes with the same name exist, the result would be one of those (unspecified).
# File lib/bio/tree.rb, line 391 391: def get_node_by_name(str) 392: self.each_node do |node| 393: if get_node_name(node) == str 394: return node 395: end 396: end 397: nil 398: end
Gets node name
# File lib/bio/tree.rb, line 230 230: def get_node_name(node) 231: begin 232: node.name 233: rescue NoMethodError 234: node.to_s 235: end 236: end
If the node exists, returns true. Otherwise, returns false.
# File lib/bio/tree.rb, line 411 411: def include?(node) 412: @pathway.graph[node] ? true : false 413: end
Insert a new node between adjacent nodes node1 and node2. The old edge between node1 and node2 are changed to the edge between new_node and node2. The edge between node1 and new_node is newly created.
If new_distance is specified, the distance between node1 and new_node is set to new_distance, and distance between new_node and node2 is set to tree.get_edge(node1, node2).distance - new_distance.
Returns self. If node1 and node2 are not adjacent, raises IndexError.
If new_node already exists in the tree, the tree would become circular. In addition, if the edge between new_node and node1 (or node2) already exists, it will be erased.
# File lib/bio/tree.rb, line 891 891: def insert_node(node1, node2, new_node, new_distance = nil) 892: unless edge = self.get_edge(node1, node2) then 893: raise IndexError, 'nodes not found or two nodes are not adjacent' 894: end 895: _clear_cache 896: new_edge = Edge.new(new_distance) 897: self.remove_edge(node1, node2) 898: self.add_edge(node1, new_node, new_edge) 899: if new_distance and old_distance = get_edge_distance(edge) then 900: old_distance -= new_distance 901: begin 902: edge.distance = old_distance 903: rescue NoMethodError 904: edge = old_distance 905: end 906: end 907: self.add_edge(new_node, node2, edge) 908: self 909: end
If node is nil, returns an array of all leaves (nodes connected with one edge). Otherwise, gets all descendent leaf nodes of the node. If root isn‘t specified or root is nil, @root is used. Returns an array of Nodes. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 739 739: def leaves(node = nil, root = nil) 740: unless node then 741: nodes = [] 742: self.each_node do |x| 743: nodes << x if self.out_degree(x) == 1 744: end 745: return nodes 746: else 747: root ||= @root 748: self.descendents(node, root).find_all do |x| 749: self.adjacent_nodes(x).size == 1 750: end 751: end 752: end
Gets the lowest common ancestor of the two nodes. If root isn‘t specified or root is nil, @root is used. Returns a Node object or nil. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 767 767: def lowest_common_ancestor(node1, node2, root = nil) 768: root ||= @root 769: distance, route = @pathway.breadth_first_search(root) 770: x = node1; r1 = [] 771: begin; r1 << x; end while x = route[x] 772: x = node2; r2 = [] 773: begin; r2 << x; end while x = route[x] 774: return (r1 & r2).first 775: end
Returns number of edges in the given node.
The reason why the method name is "out_degree" is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 366 366: def out_degree(source) 367: h = @pathway.graph[source] 368: h ? h.size : 0 369: end
Returns all connected edges with adjacent nodes. Returns an array of the array [ source, target, edge ].
The reason why the method name is "out_edges" is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 342 342: def out_edges(source) 343: h = @pathway.graph[source] 344: if h 345: h.collect { |key, val| [ source, key, val ] } 346: else 347: [] 348: end 349: end
Returns formatted text (or something) of the tree Currently supported format is: :newick, :nhx
# File lib/bio/db/newick.rb, line 235 235: def output(format, *arg, &block) 236: case format 237: when :newick 238: output_newick(*arg, &block) 239: when :nhx 240: output_nhx(*arg, &block) 241: when :phylip_distance_matrix 242: output_phylip_distance_matrix(*arg, &block) 243: else 244: raise 'Unknown format' 245: end 246: end
Returns a newick formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).
Available options:
:indent: | indent string; set false to disable (default: ’ ’) |
:bootstrap_style: | :disabled disables bootstrap representations. :traditional for traditional style. :molphy for Molphy style (default). |
# File lib/bio/db/newick.rb, line 203 203: def output_newick(options = {}, &block) #:yields: node1, node2 204: root = @root 205: root ||= self.nodes.first 206: return '();' unless root 207: __to_newick([], root, 0, :__to_newick_format_leaf, options, &block) + 208: __to_newick_format_leaf(root, Edge.new, options) + 209: ";\n" 210: end
Returns a NHX (New Hampshire eXtended) formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).
Available options:
:indent: | indent string; set false to disable (default: ’ ’) |
# File lib/bio/db/newick.rb, line 223 223: def output_nhx(options = {}, &block) #:yields: node1, node2 224: root = @root 225: root ||= self.nodes.first 226: return '();' unless root 227: __to_newick([], root, 0, 228: :__to_newick_format_leaf_NHX, options, &block) + 229: __to_newick_format_leaf_NHX(root, Edge.new, options) + 230: ";\n" 231: end
Generates phylip-style distance matrix as a string. if nodes is not given, all leaves in the tree are used. If the names of some of the given (or default) nodes are not defined or are empty, the names are automatically generated.
# File lib/bio/db/newick.rb, line 256 256: def output_phylip_distance_matrix(nodes = nil, options = {}) 257: nodes = self.leaves unless nodes 258: names = nodes.collect do |x| 259: y = get_node_name(x) 260: y = sprintf("%x", x.__id__.abs) if y.empty? 261: y 262: end 263: m = self.distance_matrix(nodes) 264: Bio::Phylip::DistanceMatrix.generate(m, names, options) 265: end
Gets the parent node of the node. If root isn‘t specified or root is nil, @root is used. Returns an Node object or nil. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 688 688: def parent(node, root = nil) 689: root ||= @root 690: raise IndexError, 'can not get parent for unrooted tree' unless root 691: unless ret = _get_cached_parent(node, root) then 692: ret = self.path(root, node)[-2] 693: _cache_parent(node, ret, root) 694: end 695: ret 696: end
Gets path from node1 to node2. Retruns an array of nodes, including node1 and node2. If node1 and/or node2 do not exist, IndexError is raised. If node1 and node2 are not connected, NoPathError is raised. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 613 613: def path(node1, node2) 614: raise IndexError, 'node1 not found' unless @pathway.graph[node1] 615: raise IndexError, 'node2 not found' unless @pathway.graph[node2] 616: return [ node1 ] if node1 == node2 617: return [ node1, node2 ] if @pathway.graph[node1][node2] 618: step, path = @pathway.bfs_shortest_path(node1, node2) 619: unless path[0] == node1 and path[-1] == node2 then 620: raise NoPathError, 'node1 and node2 are not connected' 621: end 622: path 623: end
# Removes an edge between source and target. # Returns self. # If the edge does not exist, raises IndexError. +
# File lib/bio/tree.rb, line 466 466: def remove_edge(source, target) 467: unless self.get_edge(source, target) then 468: raise IndexError, 'edge not found' 469: end 470: _clear_cache 471: fwd = [ source, target ] 472: rev = [ target, source ] 473: @pathway.relations.delete_if do |rel| 474: rel.node == fwd or rel.node == rev 475: end 476: h = @pathway.graph[source] 477: h.delete(target) if h 478: h = @pathway.graph[target] 479: h.delete(source) if h 480: self 481: end
Removes each edge if the block returns not nil. Returns self.
# File lib/bio/tree.rb, line 485 485: def remove_edge_if #:yields: source, target, edge 486: _clear_cache 487: removed_rel = [] 488: @pathway.relations.delete_if do |rel| 489: if yield rel.node[0], rel.node[1], rel.edge then 490: removed_rel << rel 491: true 492: end 493: end 494: removed_rel.each do |rel| 495: source = rel.node[0] 496: target = rel.node[1] 497: h = @pathway.graph[source] 498: h.delete(target) if h 499: h = @pathway.graph[target] 500: h.delete(source) if h 501: end 502: self 503: end
Removes the given node from the tree. All edges connected with the node are also removed. Returns self. If the node does not exist, raises IndexError.
# File lib/bio/tree.rb, line 437 437: def remove_node(node) 438: #_clear_cache #done in clear_node(node) 439: self.clear_node(node) 440: @pathway.graph.delete(node) 441: self 442: end
Removes each node if the block returns not nil. All edges connected with the removed nodes are also removed. Returns self.
# File lib/bio/tree.rb, line 447 447: def remove_node_if 448: #_clear_cache #done in clear_node(node) 449: all = self.nodes 450: all.each do |node| 451: if yield node then 452: self.clear_node(node) 453: @pathway.graph.delete(node) 454: end 455: end 456: self 457: end
Removes all nodes that are not branches nor leaves. That is, removes nodes connected with exactly two edges. For each removed node, two adjacent edges are merged and a new edge are created. Returns removed nodes. Note that orphan nodes are still kept unchanged.
# File lib/bio/tree.rb, line 853 853: def remove_nonsense_nodes 854: _clear_cache 855: hash = {} 856: self.each_node do |node| 857: hash[node] = true if @pathway.graph[node].size == 2 858: end 859: hash.each_key do |node| 860: adjs = @pathway.graph[node].keys 861: edges = @pathway.graph[node].values 862: new_edge = get_edge_merged(edges[0], edges[1]) 863: @pathway.graph[adjs[0]].delete(node) 864: @pathway.graph[adjs[1]].delete(node) 865: @pathway.graph.delete(node) 866: @pathway.append(Bio::Relation.new(adjs[0], adjs[1], new_edge)) 867: end 868: #@pathway.to_relations 869: @pathway.relations.reject! do |rel| 870: hash[rel.node[0]] or hash[rel.node[1]] 871: end 872: return hash.keys 873: end
Gets the sub-tree consisted of given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. Note that the sub-tree shares Node and Edge objects with the original tree.
# File lib/bio/tree.rb, line 544 544: def subtree(nodes) 545: nodes = nodes.find_all do |x| 546: @pathway.graph[x] 547: end 548: return self.class.new if nodes.empty? 549: # creates subtree 550: new_tree = self.class.new 551: nodes.each do |x| 552: new_tree.add_node(x) 553: end 554: self.each_edge do |node1, node2, edge| 555: if new_tree.include?(node1) and new_tree.include?(node2) then 556: new_tree.add_edge(node1, node2, edge) 557: end 558: end 559: return new_tree 560: end
Gets the sub-tree consisted of given nodes and all internal nodes connected between given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. The result is unspecified for cyclic trees. Note that the sub-tree shares Node and Edge objects with the original tree.
# File lib/bio/tree.rb, line 570 570: def subtree_with_all_paths(nodes) 571: hash = {} 572: nodes.each { |x| hash[x] = true } 573: nodes.each_index do |i| 574: node1 = nodes[i] 575: (0...i).each do |j| 576: node2 = nodes[j] 577: unless node1 == node2 then 578: begin 579: path = self.path(node1, node2) 580: rescue IndexError, NoPathError 581: path = [] 582: end 583: path.each { |x| hash[x] = true } 584: end 585: end 586: end 587: self.subtree(hash.keys) 588: end