Class Bio::GCG::Seq
In: lib/bio/appl/gcg/seq.rb
Parent: Object

Bio::GCG::Seq

This is GCG sequence file format (.seq or .pep) parser class.

References

  • Information about GCG Wisconsin Package(R)

www.accelrys.com/products/gcg_wisconsin_package .

www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html

  • BioPerl document

docs.bioperl.org/releases/bioperl-1.2.3/Bio/SeqIO/gcg.html

Methods

aaseq   calc_checksum   naseq   new   seq   to_gcg   validate_checksum  

Constants

DELIMITER = RS = nil   delimiter used by Bio::FlatFile

Attributes

checksum  [R]  "Check:" field, which indicates checksum of current sequence.
date  [R]  Date field of this entry.
definition  [R]  Description field.
entry_id  [R]  ID field.
heading  [R]  heading (’!!NA_SEQUENCE 1.0’ or whatever like this)
length  [R]  "Length:" field. Note that sometimes this might differ from real sequence length.
seq_type  [R]  "Type:" field, which indicates sequence type. "N" means nucleic acid sequence, "P" means protein sequence.

Public Class methods

Calculates checksum from given string.

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 141
141:       def self.calc_checksum(str)
142:         # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3
143:         idx = 0
144:         sum = 0
145:         str.upcase.tr('^A-Z.~', '').each_byte do |c|
146:           idx += 1
147:           sum += idx * c
148:           idx = 0 if idx >= 57
149:         end
150:         (sum % 10000)
151:       end

Creates new instance of this class. str must be a GCG seq formatted string.

[Source]

    # File lib/bio/appl/gcg/seq.rb, line 38
38:       def initialize(str)
39:         @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this
40:         str = str.sub(/.*/, '')
41:         str.sub!(/.*\.\.$/m, '')
42:         @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s
43:         desc = $&.to_s
44:         if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then
45:           @entry_id = m[1].to_s.strip
46:           @length   = (m[2] ? m[2].to_i : nil)
47:           @date     = m[3].to_s.strip
48:           @seq_type = m[4]
49:           @checksum = (m[5] ? m[5].to_i : nil)
50:         end
51:         @data = str
52:         @seq = nil
53:         @definition.strip!
54:       end

Creates a new GCG sequence format text. Parameters can be omitted.

Examples:

 Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA',
                      :seq_type=>'N', :entry_id=>'gi-1234567',
                      :seq=>seq, :date=>date)

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 161
161:       def self.to_gcg(hash)
162:         seq = hash[:seq]
163:         if seq.is_a?(Bio::Sequence::NA) then
164:           seq_type = 'N'
165:         elsif seq.is_a?(Bio::Sequence::AA) then
166:           seq_type = 'P'
167:         else
168:           seq_type = (hash[:seq_type] or 'P')
169:         end
170:         if seq_type == 'N' then
171:           head = '!!NA_SEQUENCE 1.0'
172:         else
173:           head = '!!AA_SEQUENCE 1.0'
174:         end
175:         date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M'))
176:         entry_id = hash[:entry_id].to_s.strip
177:         len = seq.length
178:         checksum = self.calc_checksum(seq)
179:         definition = hash[:definition].to_s.strip
180:         seq = seq.upcase.gsub(/.{1,50}/, "\\0\n")
181:         seq.gsub!(/.{10}/, "\\0 ")
182:         w = len.to_s.size + 1
183:         i = 1
184:         seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s }
185: 
186:         [ head, "\n", definition, "\n\n",
187:           "#{entry_id}  Length: #{len}  #{date}  " \
188:           "Type: #{seq_type}  Check: #{checksum}  ..\n",
189:           seq, "\n" ].join('')
190:       end

Public Instance methods

If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 108
108:       def aaseq
109:         if seq.is_a?(Bio::Sequence::AA) then
110:           @seq
111:         else
112:           raise 'seq_type != \'P\''
113:         end
114:       end

If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 121
121:       def naseq
122:         if seq.is_a?(Bio::Sequence::NA) then
123:           @seq
124:         else
125:           raise 'seq_type != \'N\''
126:         end
127:       end

Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 88
 88:       def seq
 89:         unless @seq then
 90:           case @seq_type
 91:           when 'N', 'n'
 92:             k = Bio::Sequence::NA
 93:           when 'P', 'p'
 94:             k = Bio::Sequence::AA
 95:           else
 96:             k = Bio::Sequence
 97:           end
 98:           @seq = k.new(@data.tr('^-a-zA-Z.~', ''))
 99:         end
100:         @seq
101:       end

Validates checksum. If validation succeeds, returns true. Otherwise, returns false.

[Source]

     # File lib/bio/appl/gcg/seq.rb, line 132
132:       def validate_checksum
133:         checksum == self.class.calc_checksum(seq)
134:       end

[Validate]