Class Bio::NCBI::REST
In: lib/bio/io/ncbirest.rb
Parent: Object

Description

The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities

Entrez utilities index:

Methods

Classes and Modules

Class Bio::NCBI::REST::EFetch
Class Bio::NCBI::REST::ESearch

Constants

NCBI_INTERVAL = 1.0 / 3.0   Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby

Wait for 1/3 seconds. NCBI‘s restriction is: "Make no more than 3 requests every 1 second.".

Public Class methods

[Source]

     # File lib/bio/io/ncbirest.rb, line 352
352:   def self.efetch(*args)
353:     self.new.efetch(*args)
354:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 340
340:   def self.einfo
341:     self.new.einfo
342:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 344
344:   def self.esearch(*args)
345:     self.new.esearch(*args)
346:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 348
348:   def self.esearch_count(*args)
349:     self.new.esearch_count(*args)
350:   end

Public Instance methods

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
 ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

 Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
 Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "omim", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "gbc", "medline", "count",…
  • step: maximum number of entries retrieved at a time
Returns:String

[Source]

     # File lib/bio/io/ncbirest.rb, line 316
316:   def efetch(ids, hash = {}, step = 100)
317:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
318:     opts = default_parameters.merge({ "retmode"  => "text" })
319:     opts.update(hash)
320: 
321:     case ids
322:     when Array
323:       list = ids
324:     else
325:       list = ids.to_s.split(/\s*,\s*/)
326:     end
327: 
328:     result = ""
329:     0.step(list.size, step) do |i|
330:       opts["id"] = list[i, step].join(',')
331:       unless opts["id"].empty?
332:         response = ncbi_post_form(serv, opts)
333:         result += response.body
334:       end
335:     end
336:     return result.strip
337:     #return result.strip.split(/\n\n+/)
338:   end

List the NCBI database names E-Utils (einfo) service

 pubmed protein nucleotide nuccore nucgss nucest structure genome
 books cancerchromosomes cdd gap domains gene genomeprj gensat geo
 gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
 popset probe proteinclusters pcassay pccompound pcsubstance snp
 taxonomy toolkit unigene unists

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.einfo

 Bio::NCBI::REST.einfo

Returns:array of string (database names)

[Source]

     # File lib/bio/io/ncbirest.rb, line 180
180:   def einfo
181:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
182:     opts = default_parameters.merge({})
183:     response = ncbi_post_form(serv, opts)
184:     result = response.body
185:     list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
186:     return list
187:   end

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "taxonomy", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "medline", "count", …
    • retmax: integer (default 100)
    • retstart: integer
    • field:
      • "titl": Title [TI]
      • "tiab": Title/Abstract [TIAB]
      • "word": Text words [TW]
      • "auth": Author [AU]
      • "affl": Affiliation [AD]
      • "jour": Journal [TA]
      • "vol": Volume [VI]
      • "iss": Issue [IP]
      • "page": First page [PG]
      • "pdat": Publication date [DP]
      • "ptyp": Publication type [PT]
      • "lang": Language [LA]
      • "mesh": MeSH term [MH]
      • "majr": MeSH major topic [MAJR]
      • "subh": Mesh sub headings [SH]
      • "mhda": MeSH date [MHDA]
      • "ecno": EC/RN Number [rn]
      • "si": Secondary source ID [SI]
      • "uid": PubMed ID (PMID) [UI]
      • "fltr": Filter [FILTER] [SB]
      • "subs": Subset [SB]
    • reldate: 365
    • mindate: 2001
    • maxdate: 2002/01/01
    • datetype: "edat"
  • limit: maximum number of entries to be returned (0 for unlimited; nil for the "retmax" value in the hash or the internal default value (=100))
  • step: maximum number of entries retrieved at a time
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 247
247:   def esearch(str, hash = {}, limit = nil, step = 10000)
248:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
249:     opts = default_parameters.merge({ "term" => str })
250:     opts.update(hash)
251: 
252:     case opts["rettype"]
253:     when "count"
254:       count = esearch_count(str, opts)
255:       return count
256:     else
257:       retstart = 0
258:       retstart = hash["retstart"].to_i if hash["retstart"]
259: 
260:       limit ||= hash["retmax"].to_i if hash["retmax"]
261:       limit ||= 100 # default limit is 100
262:       limit = esearch_count(str, opts) if limit == 0   # unlimit
263: 
264:       list = []
265:       0.step(limit, step) do |i|
266:         retmax = [step, limit - i].min
267:         opts.update("retmax" => retmax, "retstart" => i + retstart)
268:         response = ncbi_post_form(serv, opts)
269:         result = response.body
270:         list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
271:       end
272:       return list
273:     end
274:   end
Arguments:same as esearch method
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 278
278:   def esearch_count(str, hash = {})
279:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
280:     opts = default_parameters.merge({ "term" => str })
281:     opts.update(hash)
282:     opts.update("rettype" => "count")
283:     response = ncbi_post_form(serv, opts)
284:     result = response.body
285:     count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
286:     return count
287:   end

[Validate]