SEQIO -- A Package for Sequence File I/O

README - Readme File for the SEQIO Package


The SEQIO package is a set of C functions which can read and write biological sequence files formatted using various file formats and which can be used to perform database searches on biological databases. All of the code is packaged together into a single file, making it easy to incorporate into your programs. Here are the files included in the SEQIO package distribution.


Installation Notes

To install the programs associated with the package, and to setup your system to use those programs, perform the following steps.

  1. Uncompress (using gunzip) and untar the release. This will create a sub-directory "seqio-1.2" below where you untar it.
  2. Enter the sub-directory and run make to compile all of the programs. The makefiles included in the release are very simple, but since the code itself should be cross-platform portable, the makefile doesn't have to be complex. The one thing you might have to customize is the compiler name and options. The makefile is configured to use the gcc compiler. If you do not have gcc, then edit the CC and CFLAGS makefile variables for the C (or C++) compiler you do have. The only flag really necessary for the compilation is the optimization flag (it will make a difference in the programs' running time).
  3. To install the programs elsewhere, copy "fmtseq", "idxseq", "grepseq", "typeseq" and "wcseq" to the executable directory. These are the only programs that really have the potential to be considered useful application programs.
  4. If you have support for local documentation on the Web, then either create a link to the file "html/seqio_toc.html", or copy all of the files in the "html" directory and create the link to "seqio_toc.html" in the destination directory.
  5. Create a BIOSEQ file describing all of your databases (an example is given in "bioseq.txt"), and, if you want to allow single entry access to the entries of those databases, run the "idxseq" program on each of them. Tell any users of the program to include that filename as part of their BIOSEQ environment variable list of files.
  6. Enjoy.

Using the SEQIO Package Itself

To be able to use the package itself, you should be familiar with reading and writing files using the C stdio package and with doing dynamic allocation of memory using malloc and free. To use the SEQIO package in your program, simply copy the files "seqio.c" and "seqio.h" to your program directory, include the header file in any program files that use the SEQIO package, and compile the package along with your program.

At this point in time, the SEQIO package has been tested using gcc on Unix systems running SunOS, Solaris, Ultrix, IRIX and Windows NT, and using g++ on Ultrix. The code has been written to the ANSI C standard, so you will need an ANSI C/C++ compiler in order to compile the package. One suggestion I have is that you turn on optimization when compiling the SEQIO package. It will significantly improve the package's efficiency. Also, compiling the package may take several minutes, as the code is around 20,000 lines (this will get shorter in a later version (of course, I keep saying that every version)).

If you plan to use this package and wish to receive notices about updates and bug fixes, please send mail to knight@cs.ucdavis.edu. In that mail, specify whether you just want a notice about a new version of the package, or you want the patch file or complete release automatically sent to you.
(NOTE: If you see ANYTHING you think is either wrong, or should be changed, please let me know. If it is wrong, I'll fix it. If I think it isn't, I'll tell you why, and also tell you how you can get what you want.)

Any use of the SEQIO package should be accompanied with acknowledgements and copyright notices in the documentation of any software developed using the package or derived from the package. Something along the lines of:

This software uses the SEQIO package for reading and writing sequences. Copyright (c) 1996 by James Knight at Univ. of California, Davis.
Any papers describing software using the SEQIO package, or whose results were significantly aided by the use of the SEQIO package (except when the use was internal to a larger program), should include an acknowledgement and citation. The citation should be something like:
Knight, James "SEQIO: A C Package for Reading and Writing Sequences," distributed by the author.
(As soon as I get a paper out about the package, this will become a reference to the paper.)


Author and Acknowledgements

James Knight
Dept. of Computer Science
Univ. of California, Davis
Davis, CA 95616
E-mail: knight@cs.ucdavis.edu
WWW-Site: http://wwwcsif.cs.ucdavis.edu/~knight
Send any bug reports, new database/file-format information, comments, complaints or extension requests to knight@cs.ucdavis.edu.

This work was supported foremost by Dan Gusfield at UCDavis, by grant DE-FG03-90ER60999 from the Department of Energy and by the Aspen Center for Physics.

My thanks to Don Gilbert for collecting descriptions of the various formats and including them with his "readseq" program. I never used his code, but the `Formats' file was quite useful in writing the package, and I did look through his code when writing "fmtseq". Thanks also to Russell Malmberg who stuck with all of my attempts to port the package to Windows NT/95 until it finally compiled and ran. Thanks to Kay Hofmann for describing the MSF format in a detailed enough form for implementation.


COPYRIGHT NOTICE

In this version, the following copyright notice holds for the SEQIO package, its documentation and the fmtseq and idxseq programs. All of the example programs are public domain, and can be used and rewritten without any acknowledgements (although, it would be the polite thing to do).

Please note however that in a future version, some programs added to the release may have a more restrictive copyright (those programs will be restricted to non-commercial use because of the original sources used to derive the programs). However, the SEQIO package, fmtseq, idxseq and the example programs will always be freely available for commercial or non-commercial use, now and into the future.

The copyright for the SEQIO package, its documentation and the fmtseq and idxseq programs:

  Copyright (c) 1996 by James Knight at Univ. of California, Davis

  Permission to use, copy, modify, distribute and sell this software
  and its documentation is hereby granted, subject to the following
  restrictions and understandings:

    1) Any copy of this software or any copy of software derived
       from it must include this copyright notice in full.

    2) All materials or software developed as a consequence of the
       use of this software or software derived from it must duly
       acknowledge such use, in accordance with the usual standards
       of acknowledging credit in academic research.

    3) The software may be used freely by anyone for any purpose,
       commercial or non-commercial.  That includes, but is not
       limited to, its incorporation into software sold for a profit
       or the development of commercial software derived from it.
 
    4) This software is provided AS IS with no warranties of any
       kind.  The author shall have no liability with respect to the
       infringement of copyrights, trade secrets or any patents by
       this software or any part thereof.  In no event will the
       author be liable for any lost revenue or profits or other
       special, indirect and consequential damages. 


James R. Knight, knight@cs.ucdavis.edu
June 29, 1996