Class | Bio::FastaFormat |
In: |
lib/bio/db/fasta.rb
|
Parent: | DB |
Treats a FASTA formatted entry, such as:
>id and/or some comments <== comment line ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines ATGCATGCATGCATGCATGCATGCATGCATGCATGC ATGCATGCATGC
The precedent ’>’ can be omitted and the trailing ’>’ will be removed automatically.
f_str = <<END_OF_STRING >sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES >sce:YBR274W CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST] MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER KTGDPLEWRRLFKKISTICRDIILIPN END_OF_STRING f = Bio::FastaFormat.new(f_str) puts "### FastaFormat" puts "# entry" puts f.entry puts "# entry_id" p f.entry_id puts "# definition" p f.definition puts "# data" p f.data puts "# seq" p f.seq puts "# seq.type" p f.seq.type puts "# length" p f.length puts "# aaseq" p f.aaseq puts "# aaseq.type" p f.aaseq.type puts "# aaseq.composition" p f.aaseq.composition puts "# aalen" p f.aalen
DELIMITER | = | RS = "\n>" | Entry delimiter in flatfile text. | |
DELIMITER_OVERRUN | = | 1 | (Integer) excess read size included in DELIMITER. |
data | [RW] | The seuqnce lines in text. |
definition | [RW] | The comment line of the FASTA formatted data. |
entry_overrun | [R] |
Stores the comment and sequence information from one entry of the FASTA format string. If the argument contains more than one entry, only the first entry is used.
# File lib/bio/db/fasta.rb, line 119 119: def initialize(str) 120: @definition = str[/.*/].sub(/^>/, '').strip # 1st line 121: @data = str.sub(/.*/, '') # rests 122: @data.sub!(/^>.*/m, '') # remove trailing entries for sure 123: @entry_overrun = $& 124: end
Returens the Bio::Sequence::AA.
# File lib/bio/db/fasta.rb, line 204 204: def aaseq 205: Sequence::AA.new(seq) 206: end
Parsing FASTA Defline (using identifiers method), and shows accession numbers. It returns an array of strings.
# File lib/bio/db/fasta.rb, line 260 260: def accessions 261: identifiers.accessions 262: end
Returns comments.
# File lib/bio/db/fasta.rb, line 183 183: def comment 184: seq 185: @comment 186: end
Parsing FASTA Defline (using identifiers method), and shows a possibly unique identifier. It returns a string.
# File lib/bio/db/fasta.rb, line 239 239: def entry_id 240: identifiers.entry_id 241: end
Parsing FASTA Defline (using identifiers method), and shows GI/locus/accession/accession with version number. If a entry has more than two of such IDs, only the first ID are shown. It returns a string or nil.
# File lib/bio/db/fasta.rb, line 248 248: def gi 249: identifiers.gi 250: end
Parsing FASTA Defline, and extract IDs. IDs are NSIDs (NCBI standard FASTA sequence identifiers) or ":"-separated IDs. It returns a Bio::FastaDefline instance.
# File lib/bio/db/fasta.rb, line 229 229: def identifiers 230: unless defined?(@ids) then 231: @ids = FastaDefline.new(@definition) 232: end 233: @ids 234: end
Returens the length of Bio::Sequence::NA.
# File lib/bio/db/fasta.rb, line 199 199: def nalen 200: self.naseq.length 201: end
Returens the Bio::Sequence::NA.
# File lib/bio/db/fasta.rb, line 194 194: def naseq 195: Sequence::NA.new(seq) 196: end
Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast factory object.
#!/usr/bin/env ruby require 'bio' factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') flatfile.each do |entry| p entry.definition result = entry.fasta(factory) result.each do |hit| print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at end end
# File lib/bio/db/fasta.rb, line 150 150: def query(factory) 151: factory.query(entry) 152: end
Returns a joined sequence line as a String.
# File lib/bio/db/fasta.rb, line 157 157: def seq 158: unless defined?(@seq) 159: unless /\A\s*^\#/ =~ @data then 160: @seq = Sequence::Generic.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up 161: else 162: a = @data.split(/(^\#.*$)/) 163: i = 0 164: cmnt = {} 165: s = [] 166: a.each do |x| 167: if /^# ?(.*)$/ =~ x then 168: cmnt[i] ? cmnt[i] << "\n" << $1 : cmnt[i] = $1 169: else 170: x.tr!(" \t\r\n0-9", '') # lazy clean up 171: i += x.length 172: s << x 173: end 174: end 175: @comment = cmnt 176: @seq = Bio::Sequence::Generic.new(s.join('')) 177: end 178: end 179: @seq 180: end
Returns sequence as a Bio::Sequence object.
Note: If you modify the returned Bio::Sequence object, the sequence or definition in this FastaFormat object might also be changed (but not always be changed) because of efficiency.
# File lib/bio/db/fasta.rb, line 220 220: def to_biosequence 221: Bio::Sequence.adapter(self, Bio::Sequence::Adapter::FastaFormat) 222: end