![]() |
seqwords |
% seqwords Generates DHF files (domain hits files) of database hits (sequences) for nodes in a DCF file (domain classification file) by keyword search of UniProt. Name of keywords file (input): seqwords.terms Name of sequence database (input): seqwords.seq Name of DHF file (domain hits file) (output) [test.hits]: seqwords.dhf |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-keyfile] infile This option specifies the name of keywords file (input). This contains a list of keywords specific to a number of SCOP or CATH families and superfamilies used by SEQWORDS to search a sequence database. [-spfile] infile This option specifies the name of the sequence database (input) to search. [-outfile] outfile This option specifies the name of the DHF file (domain hits file) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in the DHF format (FASTA-like). The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a sequence database. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH, hits retrieved by a sparse protein signatare by using SIGSCAN or various types of HMM and profile by using LIBSCAN. Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-outfile" associated qualifiers -odirectory3 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-keyfile] (Parameter 1) |
This option specifies the name of keywords file (input). This contains a list of keywords specific to a number of SCOP or CATH families and superfamilies used by SEQWORDS to search a sequence database. | Input file | Required |
[-spfile] (Parameter 2) |
This option specifies the name of the sequence database (input) to search. | Input file | Required |
[-outfile] (Parameter 3) |
This option specifies the name of the DHF file (domain hits file) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in the DHF format (FASTA-like). The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a sequence database. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH, hits retrieved by a sparse protein signatare by using SIGSCAN or various types of HMM and profile by using LIBSCAN. | Output file | test.hits |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
TY SCOP XX CL Alpha and beta proteins (a/b) XX FO NAD(P)-binding Rossmann-fold domains XX SF NAD(P)-binding Rossmann-fold domains XX FA Lactate & malate dehydrogenases, N-terminal domain XX TE NAD(P)-binding Rossmann-fold TE Lactate & malate dehydrogenases TE Lactate dehydrogenase TE Malate dehydrogenase // |
ID ACEA_ECOLI STANDARD; PRT; 434 AA. AC P05313; DT 01-NOV-1988 (Rel. 09, Created) DT 01-NOV-1988 (Rel. 09, Last sequence update) DT 15-DEC-1998 (Rel. 37, Last annotation update) DE ISOCITRATE LYASE (EC 4.1.3.1) (ISOCITRASE) (ISOCITRATASE) (ICL). GN ACEA OR ICL. OS Escherichia coli. OC Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae; OC Escherichia. RN [1] RP SEQUENCE FROM N.A. RC STRAIN=K12; RX MEDLINE; 89083515. RA Byrne C.R., Stokes H.W., Ward K.A.; RT "Nucleotide sequence of the aceB gene encoding malate synthase A in RT Escherichia coli."; RL Nucleic Acids Res. 16:10924-10924(1988). RN [2] RP SEQUENCE FROM N.A. RC STRAIN=K12; RX MEDLINE; 88262573. RA Rieul C., Bleicher F., Duclos B., Cortay J.-C., Cozzone A.J.; RT "Nucleotide sequence of the aceA gene coding for isocitrate lyase in RT Escherichia coli."; RL Nucleic Acids Res. 16:5689-5689(1988). RN [3] RP SEQUENCE FROM N.A. RX MEDLINE; 89008064. RA Matsuoka M., McFadden B.A.; RT "Isolation, hyperexpression, and sequencing of the aceA gene encoding RT isocitrate lyase in Escherichia coli."; RL J. Bacteriol. 170:4528-4536(1988). RN [4] RP SEQUENCE FROM N.A. RC STRAIN=K12 / MG1655; RX MEDLINE; 94089392. RA Blattner F.R., Burland V.D., Plunkett G. III, Sofia H.J., RA Daniels D.L.; RT "Analysis of the Escherichia coli genome. IV. DNA sequence of the RT region from 89.2 to 92.8 minutes."; RL Nucleic Acids Res. 21:5408-5417(1993). RN [5] RP SEQUENCE OF 293-434 FROM N.A. RX MEDLINE; 88227861. RA Klumpp D.J., Plank D.W., Bowdin L.J., Stueland C.S., Chung T., RA Laporte D.C.; RT "Nucleotide sequence of aceK, the gene encoding isocitrate RT dehydrogenase kinase/phosphatase."; RL J. Bacteriol. 170:2763-2769(1988). [Part of this file has been deleted for brevity] FT CONFLICT 70 70 A -> R (IN REF. 2). FT CONFLICT 80 80 A -> R (IN REF. 1 AND 2). FT CONFLICT 116 116 I -> N (IN REF. 2). FT CONFLICT 144 144 F -> L (IN REF. 1). FT CONFLICT 305 312 LGEEFVNK -> WAKSSLISN (IN REF. 2). FT CONFLICT 307 307 E -> Q (IN REF. 1). FT STRAND 2 6 FT TURN 7 9 FT HELIX 11 23 FT TURN 26 27 FT STRAND 28 33 FT TURN 37 38 FT HELIX 39 47 FT TURN 48 48 FT STRAND 53 58 FT HELIX 64 67 FT TURN 68 69 FT STRAND 72 75 FT TURN 83 84 FT HELIX 87 108 FT TURN 110 111 FT STRAND 113 116 FT HELIX 121 134 FT TURN 135 136 FT TURN 140 141 FT STRAND 143 145 FT HELIX 148 162 FT TURN 163 163 FT HELIX 166 168 FT STRAND 173 175 FT TURN 179 181 FT STRAND 182 184 FT HELIX 186 188 FT TURN 190 191 FT HELIX 196 217 FT TURN 218 219 FT HELIX 225 242 FT TURN 243 244 FT STRAND 248 255 FT STRAND 263 271 FT TURN 272 273 FT STRAND 274 278 FT HELIX 286 311 SQ SEQUENCE 312 AA; 32337 MW; 17741A3B5AD068BA CRC64; MKVAVLGAAG GIGQALALLL KTQLPSGSEL SLYDIAPVTP GVAVDLSHIP TAVKIKGFSG EDATPALEGA DVVLISAGVA RKPGMDRSDL FNVNAGIVKN LVQQVAKTCP KACIGIITNP VNTTVAIAAE VLKKAGVYDK NKLFGVTTLD IIRSNTFVAE LKGKQPGEVE VPVIGGHSGV TILPLLSQVP GVSFTEQEVA DLTKRIQNAG TEVVEAKAGG GSATLSMGQA AARFGLSLVR ALQGEQGVVE CAYVEGDGQY ARFFSQPLLL GKNGVEERKS IGTLSAFEQN ALEGMLDTLK KDIALGEEFV NK // |
> Q60150^.^1^312^SCOP^.^0^Alpha and beta proteins (a/b)^.^.^NAD(P)-binding Rossmann-fold domains^NAD(P)-binding Rossmann-fold domains^Lactate & malate dehydrogenases, N-terminal domain^KEYWORD^0.00^0.000e+00^0.000e+00 MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVARKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIIRSNTFVAELKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVRALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGEEFVNK |
Program name | Description |
---|---|
contactcount | Counts specific versus non-specific contacts in a directory of cleaned protein chain contact files |
contacts | Reads CCF files (clean coordinate files) and writes CON files (contact files) of intra-chain residue-residue contact data |
domainalign | Generates DAF files (domain alignment files) of structure-based sequence alignments for nodes in a DCF file (domain classification file) |
domainrep | Reorder DCF file (domain classification file) so that the representative structure of each user-specified node is given first |
domainreso | Removes low resolution domains from a DCF file (domain classification file) |
interface | Reads CCF files (clean coordinate files) and writes CON files (contact files) of inter-chain residue-residue contact data |
libgen | Generates various types of discriminating elements for each alignment in a directory |
psiphi | Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file |
rocon | Reads a DHF file (domain hits file) of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a 'hits file' for the hits, which are classified and rank-ordered on the basis of score |
rocplot | Provides interpretation and graphical display of the performance of discriminating elements (e.g. profiles for protein families). rocplot reads file(s) of hits from discriminator-database search(es), performs ROC analysis on the hits, and writes graphs illustrating the diagnostic performance of the discriminating elements |
seqalign | Reads a DAF file (domain alignment file) and a DHF file (domain hits file) and writes a DAF file extended with the hits |
seqfraggle | Removes fragments from DHF files (domain hits files) or other files of sequences |
seqsearch | Generate database hits (sequences) for nodes in a DCF file (domain classification file) by using PSI-BLAST |
seqsort | Reads DHF files (domain hits files) of database hits (sequences) and removes hits of ambiguous classification |
siggen | Generates a sparse protein signature from an alignment and residue contact data |
sigscan | Generates a DHF file (domain hits file) of hits (sequences) from scanning a signature against a sequence database |