SEQIO -- A Package for Sequence File I/O

QUICKREF.DOC - A Quick Reference Guide to the SEQIO Package

Defined Functions

Defined Structures, Variables and Constants


Opening and Closing Files/Database-Searches

SEQFILE *seqfopen(char *filename, char *mode, char *format)

Open a file for reading or writing.

SEQFILE *seqfopendb(char *dbspec)

Open a database (or part of a database) to be read.

SEQFILE *seqfopen2(char *string)

Open a file for reading or start a database search.

void seqfclose(SEQFILE *sfp)

Close a file or database search.

Reading Sequences/Entries

int seqfread(SEQFILE *sfp, int flag)

Read the next sequence or sequence entry.

char *seqfgetseq(SEQFILE *sfp, int *length_out, int newbuffer)
char *seqfgetrawseq(SEQFILE *sfp, int *length_out, int newbuffer)
char *seqfgetentry(SEQFILE *sfp, int *length_out, int newbuffer)
SEQINFO *seqfgetinfo(SEQFILE *sfp, int newbuffer)

Read the next sequence or entry and return the sequence, entry or sequence information.

Access Functions for the Current Sequence, Entry and Information

char *seqfsequence(SEQFILE *sfp, int *length_out, int newbuffer)
char *seqfrawseq(SEQFILE *sfp, int *length_out, int newbuffer)
char *seqfentry(SEQFILE *sfp, int *length_out, int newbuffer)
SEQINFO *seqfinfo(SEQFILE *sfp, int newbuffer)

Return the sequence, raw sequence, entry or sequence information for the current sequence.

typedef struct {
  char *dbname, *filename, *format;
  int entryno, seqno, numseqs;

  char *date, *idlist, *description;
  char *comment, *organism, *history;
  int isfragment, iscircular, alphabet;
  int fragstart, truelen, rawlen;
} SEQINFO;
char *seqfdbname(SEQFILE *sfp, int newbuffer)
char *seqffilename(SEQFILE *sfp, int newbuffer)
char *seqfformat(SEQFILE *sfp, int newbuffer)
int seqfentryno(SEQFILE *sfp)
int seqfseqno(SEQFILE *sfp)
int seqfnumseqs(SEQFILE *sfp)
char *seqfdate(SEQFILE *sfp, int newbuffer)
char *seqfidlist(SEQFILE *sfp, int newbuffer)
char *seqfdescription(SEQFILE *sfp, int newbuffer)
char *seqfcomment(SEQFILE *sfp, int newbuffer)
char *seqforganism(SEQFILE *sfp, int newbuffer)
int seqfiscircular(SEQFILE *sfp)
int seqfisfragment(SEQFILE *sfp)
int seqffragstart(SEQFILE *sfp)
int seqfalphabet(SEQFILE *sfp)
int seqftruelen(SEQFILE *sfp)
int seqfrawlen(SEQFILE *sfp)

Access functions for information about the current sequence.
char *seqfmainid(SEQFILE *sfp, int newbuffer)
char *seqfmainacc(SEQFILE *sfp, int newbuffer)

Access functions for the main identifiers of the current sequence.
int seqfoneline(SEQINFO *info, char *buffer, int buflen, int idonly)

Constructs a "oneline" description of a sequence and stores it in the buffer.

void seqfsetidpref(SEQFILE *sfp, char *idprefix)
void seqfsetdbname(SEQFILE *sfp, char *dbname)
void seqfsetalpha(SEQFILE *sfp, char *alphabet)

Set or unset the value for the identifier prefix, database name or alphabet.

Writing Sequences/Entries

int seqfwrite(SEQFILE *sfp, char *seq, int seqlen, SEQINFO *info)

Output a sequence and its information.

int seqfconvert(SEQFILE *input_sfp, SEQFILE *output_sfp)

Convert and output the current sequence of input_sfp.

int seqfputs(SEQFILE *sfp, char *s, int len)

Output a string on the output stream (without any transformation or checking).

int seqfannotate(SEQFILE *sfp, char *entry, int entrylen, char *newcomment, int flag)

Output the passed in entry, adding the new comment. (The entry must be in the format specified when opening the output stream.)

int seqfgcgify(SEQFILE *sfp, char *entry, int entrylen)

Output the passed in entry, converting the sequence lines into the GCG format. (The SEQFILE structure must be opened to one of the GCG-* formats, and the format of the entry must match the `*' of the GCG-*.)

int seqfungcgify(SEQFILE *sfp, char *entry, int entrylen)

Output the passed in entry, converting the sequence lines back to the original format (from the GCG format). (The format of the entry must be one of the GCG-* formats, and the SEQFILE structure must be opened to the `*' format matching the GCG-*.)

BIOSEQ Database Functions

int bioseq_read(char *filelist)

Read one or more BIOSEQ files and store the BIOSEQ entries in the files.

int bioseq_check(char *dbspec)

Test if the dbspec refers to a known BIOSEQ entry.

char *bioseq_info(char *dbspec, char *fieldname)

Retrieve an information field for a BIOSEQ entry.

char *bioseq_matchinfo(char *fieldname, char *fieldvalue)

Find the first database (in the list of BIOSEQ entries) which has an information field matching `fieldname' and whose value matches `fieldvalue'.

char *bioseq_parse(char *dbspec)

Parse a BIOSEQ database specification and get the list of files that should be opened and read in that search.

Miscellaneous

int seqfisafile(char *filename)

Test whether the filename refers to an existing file (even when the string contains a single entry access specification).
int seqfisaformat(char *format)

Test whether the string is a valid file format.

int seqffmttype(char *format)

Return a type information value about the format.

int seqfcanwrite(char *format)

Test whether the format is writeable.

int seqfcanannotate(char *format)

Test whether entries in the format can be annotated.

int seqfcangcgify(char *format)

Test whether entries in the format can be gcgified or ungcgified.

void seqfbytepos(SEQFILE *sfp)

void seqfsetpretty(SEQFILE *sfp, int value)

Should whitespace be added to the output sequence?
(Plain, FASTA, NBRF and IG/Stanford formats only)

SEQINFO *seqfparseent(char *entry, int entrylen, char *format)

Retrieve the sequence information stored in the passed in entry.

int asn_parse(char *begin, char *end, ...)

Search an ASN.1 text format record (the string from `begin' to `end') for specified sub-records.

Error Handling/Reporting

extern int seqferrno;
extern char seqferrstr[];
External variables giving an error value and an error message string.

void seqfperror(char *s)

Output error message, similar to the Unix perror.

void seqfsetperror(void (*perr_fn)(char *))

Sets the function used by the package to output all of its error messages. If the argument is NULL, the default function (outputting to stderr) will be used.

int seqferrpolicy(int pe)

Sets the way the SEQIO package reports errors.


James R. Knight, knight@cs.ucdavis.edu
June 26, 1996