Module | Bio::Alignment::EnumerableExtension |
In: |
lib/bio/alignment.rb
|
The module Bio::Alignment::EnumerableExtension is a set of useful methods for multiple sequence alignment. It can be included by any classes or can be extended to any objects. The classes or objects must have methods defined in Enumerable, and must have the each method which iterates over each sequence (or string) and yields a sequence (or string) object.
Optionally, if each_seq method is defined, which iterates over each sequence (or string) and yields each sequence (or string) object, it is used instead of each.
Note that the each or each_seq method would be called multiple times. This means that the module is not suitable for IO objects. In addition, break would be used in the given block and destructive methods would be used to the sequences.
For Array or Hash objects, you‘d better using ArrayExtension or HashExtension modules, respectively. They would have built-in each_seq method and/or some methods would be redefined.
Iterates over each sequence and results running blocks are collected and returns a new alignment as a Bio::Alignment::SequenceArray object.
Note that it would be redefined if you want to change return value‘s class.
# File lib/bio/alignment.rb, line 445 445: def alignment_collect 446: a = SequenceArray.new 447: a.set_all_property(get_all_property) 448: each_seq do |str| 449: a << yield(str) 450: end 451: a 452: end
Concatenates the given alignment. align must have each_seq or each method.
Returns self.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant and key information is completely ignored.
# File lib/bio/alignment.rb, line 849 849: def alignment_concat(align) 850: flag = nil 851: a = [] 852: each_seq { |s| a << s } 853: i = 0 854: begin 855: align.each_seq do |seq| 856: flag = true 857: a[i].concat(seq) if a[i] and seq 858: i += 1 859: end 860: return self 861: rescue NoMethodError, ArgumentError => evar 862: raise evar if flag 863: end 864: align.each do |seq| 865: a[i].concat(seq) if a[i] and seq 866: i += 1 867: end 868: self 869: end
Returns the alignment length. Returns the longest length of the sequence in the alignment.
# File lib/bio/alignment.rb, line 366 366: def alignment_length 367: maxlen = 0 368: each_seq do |s| 369: x = s.length 370: maxlen = x if x > maxlen 371: end 372: maxlen 373: end
Removes excess gaps in the head of the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb, line 752 752: def alignment_lstrip! 753: #(String-like) 754: pos = 0 755: each_site do |a| 756: a.remove_gaps! 757: if a.empty? 758: pos += 1 759: else 760: break 761: end 762: end 763: return nil if pos <= 0 764: each_seq { |s| s[0, pos] = '' } 765: self 766: end
Fills gaps to the tail of each sequence if the length of the sequence is shorter than the alignment length.
Note that it is a destructive method.
# File lib/bio/alignment.rb, line 712 712: def alignment_normalize! 713: #(original) 714: len = alignment_length 715: each_seq do |s| 716: s << (gap_char * (len - s.length)) if s.length < len 717: end 718: self 719: end
Removes excess gaps in the tail of the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb, line 727 727: def alignment_rstrip! 728: #(String-like) 729: len = alignment_length 730: newlen = len 731: each_site_step(len - 1, 0, -1) do |a| 732: a.remove_gaps! 733: if a.empty? then 734: newlen -= 1 735: else 736: break 737: end 738: end 739: return nil if newlen >= len 740: each_seq do |s| 741: s[newlen..-1] = '' if s.length > newlen 742: end 743: self 744: end
Gets a site of the position. Returns a Bio::Alignment::Site object.
If the position is out of range, it returns the site of which all are gaps.
# File lib/bio/alignment.rb, line 403 403: def alignment_site(position) 404: site = _alignment_site(position) 405: site.set_all_property(get_all_property) 406: site 407: end
Returns the specified range of the alignment. For each sequence, the ‘slice’ method (it may be String#slice, which is the same as String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.
Unlike alignment_window method, the result alignment might contain nil.
If you want to change return value‘s class, you should redefine alignment_collect method.
# File lib/bio/alignment.rb, line 807 807: def alignment_slice(*arg) 808: #(String-like) 809: #(BioPerl) AlignI::slice like method 810: alignment_collect do |s| 811: s.slice(*arg) 812: end 813: end
Removes excess gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb, line 774 774: def alignment_strip! 775: #(String-like) 776: r = alignment_rstrip! 777: l = alignment_lstrip! 778: (r or l) 779: end
For each sequence, the ‘subseq’ method (Bio::Seqeunce::Common#subseq is expected) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.
All sequences in the alignment are expected to be kind of Bio::Sequence::NA or Bio::Sequence::AA objects.
Unlike alignment_window method, the result alignment might contain nil.
If you want to change return value‘s class, you should redefine alignment_collect method.
# File lib/bio/alignment.rb, line 829 829: def alignment_subseq(*arg) 830: #(original) 831: alignment_collect do |s| 832: s.subseq(*arg) 833: end 834: end
Returns specified range of the alignment. For each sequence, the ’[]’ method (it may be String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.
Unlike alignment_slice method, the result alignment are guaranteed to contain String object if the range specified is out of range.
If you want to change return value‘s class, you should redefine alignment_collect method.
# File lib/bio/alignment.rb, line 466 466: def alignment_window(*arg) 467: alignment_collect do |s| 468: s[*arg] or seqclass.new('') 469: end 470: end
Iterates over each site of the alignment and results running the block are collected and returns an array. It yields a Bio::Alignment::Site object.
# File lib/bio/alignment.rb, line 503 503: def collect_each_site 504: ary = [] 505: each_site do |site| 506: ary << yield(site) 507: end 508: ary 509: end
Helper method for calculating consensus sequence. It iterates over each site of the alignment. In each site, gaps will be removed if specified with opt. It yields a Bio::Alignment::Site object. Results running the block (String objects are expected) are joined to a string and it returns the string.
opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters 1 -- a site within gaps is regarded as a gap -1 -- gaps are eliminated from consensus calculation default: 0
# File lib/bio/alignment.rb, line 523 523: def consensus_each_site(opt = {}) 524: mchar = (opt[:missing_char] or self.missing_char) 525: gap_mode = opt[:gap_mode] 526: case gap_mode 527: when 0, nil 528: collect_each_site do |a| 529: yield(a) or mchar 530: end.join('') 531: when 1 532: collect_each_site do |a| 533: a.has_gap? ? gap_char : (yield(a) or mchar) 534: end.join('') 535: when -1 536: collect_each_site do |a| 537: a.remove_gaps! 538: a.empty? ? gap_char : (yield(a) or mchar) 539: end.join('') 540: else 541: raise ':gap_mode must be 0, 1 or -1' 542: end 543: end
Returns the IUPAC consensus string of the alignment of nucleic-acid sequences.
It resembles the BioPerl‘s AlignI::consensus_iupac method.
Please refer to the consensus_each_site method for opt.
# File lib/bio/alignment.rb, line 565 565: def consensus_iupac(opt = {}) 566: consensus_each_site(opt) do |a| 567: a.consensus_iupac 568: end 569: end
Returns the consensus string of the alignment. 0.0 <= threshold <= 1.0 is expected.
It resembles the BioPerl‘s AlignI::consensus_string method.
Please refer to the consensus_each_site method for opt.
# File lib/bio/alignment.rb, line 552 552: def consensus_string(threshold = 1.0, opt = {}) 553: consensus_each_site(opt) do |a| 554: a.consensus_string(threshold) 555: end 556: end
This is the BioPerl‘s AlignI::match like method.
Changes second to last sequences’ sites to match_char(default: ’.’) when a site is equeal to the first sequence‘s corresponding site.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant.
# File lib/bio/alignment.rb, line 662 662: def convert_match(match_char = '.') 663: #(BioPerl) AlignI::match like method 664: len = alignment_length 665: firstseq = nil 666: each_seq do |s| 667: unless firstseq then 668: firstseq = s 669: else 670: (0...len).each do |i| 671: if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i]) 672: s[i..i] = match_char 673: end 674: end 675: end 676: end 677: self 678: end
This is the BioPerl‘s AlignI::unmatch like method.
Changes second to last sequences’ sites match_char(default: ’.’) to original sites’ characters.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant.
# File lib/bio/alignment.rb, line 690 690: def convert_unmatch(match_char = '.') 691: #(BioPerl) AlignI::unmatch like method 692: len = alignment_length 693: firstseq = nil 694: each_seq do |s| 695: unless firstseq then 696: firstseq = s 697: else 698: (0...len).each do |i| 699: if s[i..i] == match_char then 700: s[i..i] = (firstseq[i..i] or match_char) 701: end 702: end 703: end 704: end 705: self 706: end
Iterates over each sequences. Yields a sequence. It acts the same as Enumerable#each.
You would redefine the method suitable for the class/object.
# File lib/bio/alignment.rb, line 340 340: def each_seq(&block) #:yields: seq 341: each(&block) 342: end
Iterates over each site of the alignment. It yields a Bio::Alignment::Site object (which inherits Array). It returns self.
# File lib/bio/alignment.rb, line 412 412: def each_site 413: cp = get_all_property 414: (0...alignment_length).each do |i| 415: site = _alignment_site(i) 416: site.set_all_property(cp) 417: yield(site) 418: end 419: self 420: end
Iterates over each site of the alignment, with specifying start, stop positions and step. It yields Bio::Alignment::Site object (which inherits Array). It returns self. It is same as start.step(stop, step) { |i| yield alignment_site(i) }.
# File lib/bio/alignment.rb, line 428 428: def each_site_step(start, stop, step = 1) 429: cp = get_all_property 430: start.step(stop, step) do |i| 431: site = _alignment_site(i) 432: site.set_all_property(cp) 433: yield(site) 434: end 435: self 436: end
Iterates over each sliding window of the alignment. window_size is the size of sliding window. step is the step of each sliding. It yields a Bio::Alignment::SequenceArray object which contains each sliding window. It returns a Bio::Alignment::SequenceArray object which contains remainder alignment at the terminal end. If window_size is smaller than 0, it returns nil.
# File lib/bio/alignment.rb, line 481 481: def each_window(window_size, step_size = 1) 482: return nil if window_size < 0 483: if step_size >= 0 then 484: last_step = nil 485: 0.step(alignment_length - window_size, step_size) do |i| 486: yield alignment_window(i, window_size) 487: last_step = i 488: end 489: alignment_window((last_step + window_size)..-1) 490: else 491: i = alignment_length - window_size 492: while i >= 0 493: yield alignment_window(i, window_size) 494: i += step_size 495: end 496: alignment_window(0...(i-step_size)) 497: end 498: end
Returns the match line stirng of the alignment of nucleic- or amino-acid sequences. The type of the sequence is automatically determined or you can specify with opt[:type].
It resembles the BioPerl‘s AlignI::match_line method.
opt[:type] ==> :na or :aa (or determined by sequence class) opt[:match_line_char] ==> 100% equal default: '*' opt[:strong_match_char] ==> strong match default: ':' opt[:weak_match_char] ==> weak match default: '.' opt[:mismatch_char] ==> mismatch default: ' ' :strong_ and :weak_match_char are used only in amino mode (:aa)
More opt can be accepted. Please refer to the consensus_each_site method for opt.
# File lib/bio/alignment.rb, line 624 624: def match_line(opt = {}) 625: case opt[:type] 626: when :aa 627: amino = true 628: when :na, :dna, :rna 629: amino = false 630: else 631: if seqclass == Bio::Sequence::AA then 632: amino = true 633: elsif seqclass == Bio::Sequence::NA then 634: amino = false 635: else 636: amino = nil 637: self.each_seq do |x| 638: if /[EFILPQ]/i =~ x 639: amino = true 640: break 641: end 642: end 643: end 644: end 645: if amino then 646: match_line_amino(opt) 647: else 648: match_line_nuc(opt) 649: end 650: end
Returns the match line stirng of the alignment of amino-acid sequences.
It resembles the BioPerl‘s AlignI::match_line method.
opt[:match_line_char] ==> 100% equal default: '*' opt[:strong_match_char] ==> strong match default: ':' opt[:weak_match_char] ==> weak match default: '.' opt[:mismatch_char] ==> mismatch default: ' '
More opt can be accepted. Please refer to the consensus_each_site method for opt.
# File lib/bio/alignment.rb, line 584 584: def match_line_amino(opt = {}) 585: collect_each_site do |a| 586: a.match_line_amino(opt) 587: end.join('') 588: end
Returns the match line stirng of the alignment of nucleic-acid sequences.
It resembles the BioPerl‘s AlignI::match_line method.
opt[:match_line_char] ==> 100% equal default: '*' opt[:mismatch_char] ==> mismatch default: ' '
More opt can be accepted. Please refer to the consensus_each_site method for opt.
# File lib/bio/alignment.rb, line 601 601: def match_line_nuc(opt = {}) 602: collect_each_site do |a| 603: a.match_line_nuc(opt) 604: end.join('') 605: end
Returns number of sequences in this alignment.
# File lib/bio/alignment.rb, line 1315 1315: def number_of_sequences 1316: i = 0 1317: self.each_seq { |s| i += 1 } 1318: i 1319: end
Completely removes ALL gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb, line 787 787: def remove_all_gaps! 788: ret = nil 789: each_seq do |s| 790: x = s.gsub!(gap_regexp, '') 791: ret ||= x 792: end 793: ret ? self : nil 794: end
Returns class of the sequence. If instance variable @seqclass (which can be set by ‘seqclass=’ method) is set, simply returns the value. Otherwise, returns the first sequence‘s class. If no sequences are found, returns nil.
# File lib/bio/alignment.rb, line 349 349: def seqclass 350: if (defined? @seqclass) and @seqclass then 351: @seqclass 352: else 353: klass = nil 354: each_seq do |s| 355: if s then 356: klass = s.class 357: break if klass 358: end 359: end 360: (klass or String) 361: end 362: end