Class Bio::PAML::Codeml::Report
In: lib/bio/appl/paml/codeml/report.rb
Parent: Bio::PAML::Common::Report

Description

Run PAML codeml and get the results from the output file. The Codeml::Report object is returned by Bio::PAML::Codeml.query. For example

  codeml = Bio::PAML::Codeml.new('codeml', :runmode => 0,
      :RateAncestor => 1, :alpha => 0.5, :fix_alpha => 0)
  result = codeml.query(alignment, tree)

where alignment and tree are Bioruby objects. This class assumes we have a buffer containing the output of codeml.

References

Phylogenetic Analysis by Maximum Likelihood (PAML) is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. Suggestion citation

  Yang, Z. 1997
  PAML: a program package for phylogenetic analysis by maximum likelihood
  CABIOS 13:555-556

abacus.gene.ucl.ac.uk/software/paml.html

Examples

Invoke Bioruby‘s PAML codeml parser, after having read the contents of the codeml result file into buf (for example using File.read)

  >> c = Bio::PAML::Codeml::Report.new(buf)

Do we have two models?

  >> c.models.size
  => 2
  >> c.models[0].name
  => "M0"
  >> c.models[1].name
  => "M3"

Check the general information

  >> c.num_sequences
  => 6
  >> c.num_codons
  => 134
  >> c.descr
  => "M0-3"

Test whether the second model M3 is significant over M0

  >> c.significant
  => true

Now fetch the results of the first model M0, and check its values

  >> m0 = c.models[0]
  >> m0.tree_length
  => 1.90227
  >> m0.lnL
  => -1125.800375
  >> m0.omega
  => 0.58589
  >> m0.dN_dS
  => 0.58589
  >> m0.kappa
  => 2.14311
  >> m0.alpha
  => nil

We also have a tree (as a string)

  >> m0.tree
  => "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.400074): 0.000004, PITG_23257T0: 0.952614): 0.000004, PITG_23264T0: 0.445507): 0.000004, PITG_23267T0: 0.011814, PITG_23293T0: 0.092242);"

Check the M3 and its specific values

  >> m3 = c.models[1]
  >> m3.lnL
  => -1070.964046
  >> m3.classes.size
  => 3
  >> m3.classes[0]
  => {:w=>0.00928, :p=>0.56413}

And the tree

  >> m3.tree
  => "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.762597): 0.000004, PITG_23257T0: 2.721710): 0.000004, PITG_23264T0: 0.924326): 0.014562, PITG_23267T0: 0.000004, PITG_23293T0: 0.237433);"

Next take the overall posterior analysis

  >> c.nb_sites.size
  => 44
  >> c.nb_sites[0].to_a
  => [17, "I", 0.988, 3.293]

or by field

  >> codon = c.nb_sites[0]
  >> codon.position
  => 17
  >> codon.probability
  => 0.988
  >> codon.dN_dS
  => 3.293

with aliases

  >> codon.p
  => 0.988
  >> codon.w
  => 3.293

Now we generate special string ‘graph’ for positive selection. The following returns a string the length of the input alignment and shows the locations of positive selection:

  >> c.nb_sites.graph[0..32]
  => "                **    *       * *"

And with dN/dS (high values are still an asterisk *)

  >> c.nb_sites.graph_omega[0..32]
  => "                3*    6       6 2"

We also provide the raw buffers to adhere to the principle of unexpected use. Test the raw buffers for content:

  >> c.header.to_s =~ /seed/
  => 1
  >> m0.to_s =~ /one-ratio/
  => 3
  >> m3.to_s =~ /discrete/
  => 3
  >> c.footer.to_s =~ /Bayes/
  => 16

Finally we do a test on an M7+M8 run. Again, after loading the results file into buf

Invoke Bioruby‘s PAML codeml parser

  >> c = Bio::PAML::Codeml::Report.new(buf78)

Do we have two models?

  >> c.models.size
  => 2
  >> c.models[0].name
  => "M7"
  >> c.models[1].name
  => "M8"

Assert the results are significant

  >> c.significant
  => true

Compared to M0/M3 there are some differences. The important ones are the parameters and the full Bayesian result available for M7/M8. This is the naive Bayesian result:

  >> c.nb_sites.size
  => 10

And this is the full Bayesian result:

  >> c.sites.size
  => 30
  >> c.sites[0].to_a
  => [17, "I", 0.672, 2.847]
  >> c.sites.graph[0..32]
  => "                **    *       * *"

Note the differences of omega with earlier M0-M3 naive Bayesian analysis:

  >> c.sites.graph_omega[0..32]
  => "                24    3       3 2"

The locations are the same, but the omega differs.

Methods

Attributes

footer  [R] 
header  [R] 
models  [R] 

Public Class methods

Parse codeml output file passed with buf, where buf contains the content of a codeml result file

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 222
222:       def initialize buf
223:         # split the main buffer into sections for each model, header and footer.
224:         sections = buf.split("\nModel ")
225:         model_num = sections.size-1
226:         raise ReportError,"Incorrect codeml data models=#{model_num}" if model_num > 2
227:         foot2 = sections[model_num].split("\nNaive ")
228:         if foot2.size == 2
229:           # We have a dual model
230:           sections[model_num] = foot2[0]
231:           @footer = 'Naive '+foot2[1]
232:           @models = []
233:           sections[1..-1].each do | model_buf |
234:             @models.push Model.new(model_buf)
235:           end
236:         else
237:           # A single model is run
238:           sections = buf.split("\nTREE #")
239:           model_num = sections.size-1
240:           raise ReportError,"Can not parse single model file" if model_num != 1
241:           @models = []
242:           @models.push sections[1]
243:           @footer = sections[1][/Time used/,1]
244:           @single = ReportSingle.new(buf)
245:         end
246:         @header = sections[0]
247:       end

Public Instance methods

Give a short description of the models, for example ‘M0-3‘

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 250
250:       def descr
251:         num = @models.size
252:         case num
253:           when 0 
254:             'No model'
255:           when 1 
256:             @models[0].name
257:           else 
258:             @models[0].name + '-' + @models[1].modelnum.to_s
259:         end
260:       end

Return a PositiveSites (naive empirical bayesian) object

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 273
273:       def nb_sites
274:         PositiveSites.new("Naive Empirical Bayes (NEB)",@footer,num_codons)
275:       end

Return the number of condons in the codeml alignment

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 263
263:       def num_codons
264:         @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[5].to_i/3
265:       end

Return the number of sequences in the codeml alignment

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 268
268:       def num_sequences
269:         @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[4].to_i
270:       end

If the number of models is two we can calculate whether the result is statistically significant, or not, at the 1% significance level. For example, for M7-8 the LRT statistic, or twice the log likelihood difference between the two compared models, may be compared against chi-square, with critical value 9.21 at the 1% significance level.

Here we support a few likely combinations, M0-3, M1-2 and M7-8, used most often in literature. For other combinations, or a different significance level, you‘ll have to calculate chi-square yourself.

Returns true or false. If no result is calculated this method raises an error

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 294
294:       def significant
295:         raise ReportError,"Wrong number of models #{@models.size}" if @models.size != 2
296:         lnL1 = @models[0].lnL
297:         model1 = @models[0].modelnum
298:         lnL2 = @models[1].lnL
299:         model2 = @models[1].modelnum
300:         case [model1, model2]
301:           when [0,3]
302:             2*(lnL2-lnL1) > 13.2767   # chi2: p=0.01, df=4
303:           when [1,2]
304:             2*(lnL2-lnL1) >  9.2103   # chi2: p=0.01, df=2
305:           when [7,8]
306:             2*(lnL2-lnL1) >  9.2103   # chi2: p=0.01, df=2
307:           else
308:             raise ReportError,"Significance calculation for #{descr} not supported"
309:         end
310:       end

Return a PositiveSites Bayes Empirical Bayes (BEB) analysis

[Source]

     # File lib/bio/appl/paml/codeml/report.rb, line 278
278:       def sites
279:         PositiveSites.new("Bayes Empirical Bayes (BEB)",@footer,num_codons)
280:       end

[Validate]