EMBOSS: dbiflat


Program dbiflat

Function

Index a flat file database

Description

dbiflat indexes a flat file database of one or more files, and builds EMBL CD-ROM format index files. This format is used by the software on the EMBL database CD-ROM distribution and by the Staden package in addition to EMBOSS, and appears to be the most generally used and publicly available index file format for these databases.

Usage

Here is a sample session with dbiflat, using the data in the test/embl directory of the distribution which is normally indexed as "tembl"

% dbiflat
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
Entry format [SWISS]: EMBL
Database name: tembl
Database directory [.]: /nfs/disk42/pmr/emboss/osf/emboss/test/embl/
Wildcard database filename [*.dat]: 
Release number [0.0]: 1.0
Index date [00/00/00]: 04/02/00

Command line arguments

   Mandatory qualifiers (* if not always prompted):
   -idformat           list       Entry format
  [-dbname]            string     Database name
*  -directory          string     Database directory
*  -filenames          string     Wildcard database filename
   -release            string     Release number
   -date               string     Index date

   Optional qualifiers: (none)
   Advanced qualifiers:
   -staden             bool       Use staden index file names
   -exclude            string     wildcard filename(s) to exclude
   -indexdirectory     string     Index directory
   -sortoptions        string     Sort options, typically '-T .' to use
                                  current directory for work files and '-k
                                  1,1' to force GNU sort to use the first
                                  field
   -[no]systemsort     bool       Use system sort utility
   -[no]cleanup        bool       Clean up temporary files


Mandatory qualifiers Allowed values Default
-idformat Entry format
EMBL (EMBL)
SWISS (Swiss-Prot, SpTrEMBL, TrEMBLnew)
GB (Genbank, DDBJ)
SWISS
[-dbname]
(Parameter 1)
Database name A string from 1 to 19 characters Required
-directory Database directory Any string is accepted .
-filenames Wildcard database filename Any string is accepted *.dat
-release Release number A string up to 9 characters 0.0
-date Index date Date string dd/mm/yy 00/00/00
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-staden Use staden index file names Yes/No No
-exclude wildcard filename(s) to exclude Any string is accepted An empty string is accepted
-indexdirectory Index directory Any string is accepted .
-sortoptions Sort options, typically '-T .' to use current directory for work files and '-k 1,1' to force GNU sort to use the first field Any string is accepted -T . -k 1,1
-[no]systemsort Use system sort utility Yes/No Yes
-[no]cleanup Clean up temporary files Yes/No Yes

Input file format

Output file format

dbiflat creates four index files. All are binary but with a simple format.

Data files

Notes

The indexing method depends on each entry having a unique entry name. No allowance is made for two entries with the same name so it is not possible to index EMBL and EMBLNEW together.

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Program nameDescription
dbiblastIndex a BLAST database
dbifastaIndex a fasta database
dbigcgIndex a GCG formatted database

Author(s)

This application was written by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed December 1999

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments