![]() |
EMBOSS: siggen |
Each position in the alignment is scored on the basis of a single or any combination of up to 3 scoring schemes. A signature of, for example, 10% sparsity would include data from the top 10% highest scoring alignment positions.
The resulting protein signature file is used by the application sigscan to find examples of the signature in other proteins.
% siggen Generates a sparse protein signature Location of alignment files for input [./]: ./jontest Extension of alignment files for input [.align]: Location of contact files for input [./]: ./jontest Extension of contact files [.con]: % sparsity of signature [10]: Generate a randomized signature [N]: Substitution matrix to be used [./EBLOSUM62]: Score alignment on basis of residue conservation [Y]: Score alignment on basis of number of contacts [Y]: Score alignment on basis of conservation of contacts [Y]: N Score alignment on a combined measure of number and conservation of contacts [N]: Ignore alignment postitions with post_similar value of 0 [Y]: Name of signature file for output [sig.sig]:
Mandatory qualifiers (* if not always prompted): [-algpath] string Location of alignment files for input [-algextn] string Extension of alignment files for input [-sigpath] string Location of signature files for output [-sigextn] string Extension of signature files for output [-sparsity] integer % sparsity of signature [-randomise] bool Generate a randomised signature * -seqoption list Select number * -datafile matrixf Substitution matrix to be used * -conoption list Select number * -filtercon bool Ignore alignment positions making less than a threshold number of contacts * -conthresh integer Threshold contact number * -conpath string Location of contact files for input * -conextn string Extension of contact files * -cpdbpath string Location of coordinate files for input (embl-like format) * -cpdbextn string Extension of coordinate files (embl-like format) * -filterpsim bool Ignore alignment postitions with post_similar value of 0 Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
[-algpath] (Parameter 1) |
Location of alignment files for input | Any string is accepted | ./ | ||||||||
[-algextn] (Parameter 2) |
Extension of alignment files for input | Any string is accepted | .align | ||||||||
[-sigpath] (Parameter 3) |
Location of signature files for output | Any string is accepted | ./ | ||||||||
[-sigextn] (Parameter 4) |
Extension of signature files for output | Any string is accepted | .sig | ||||||||
[-sparsity] (Parameter 5) |
% sparsity of signature | Any integer value | 10 | ||||||||
[-randomise] (Parameter 6) |
Generate a randomised signature | Yes/No | No | ||||||||
-seqoption | Select number |
|
3 | ||||||||
-datafile | Substitution matrix to be used | Comparison matrix file in EMBOSS data path | ./EBLOSUM62 | ||||||||
-conoption | Select number |
|
4 | ||||||||
-filtercon | Ignore alignment positions making less than a threshold number of contacts | Yes/No | No | ||||||||
-conthresh | Threshold contact number | Any integer value | 10 | ||||||||
-conpath | Location of contact files for input | Any string is accepted | /data/contacts/ | ||||||||
-conextn | Extension of contact files | Any string is accepted | .con | ||||||||
-cpdbpath | Location of coordinate files for input (embl-like format) | Any string is accepted | /data/cpdbscop/ | ||||||||
-cpdbextn | Extension of coordinate files (embl-like format) | Any string is accepted | .pxyz | ||||||||
-filterpsim | Ignore alignment postitions with post_similar value of 0 | Yes/No | No | ||||||||
Optional qualifiers | Allowed values | Default | |||||||||
(none) | |||||||||||
Advanced qualifiers | Allowed values | Default | |||||||||
(none) |
Example excerpt from an output signature file:
CL All beta proteins XX FO Lipocalins XX SF Lipocalins XX FA Fatty acid binding protein-like XX NP 2 XX NN [1] XX IN NRES 3 ; NGAP 2 ; WSIZ 2 XX AA A ; 2 AA V ; 1 AA L ; 4 XX GA 1 ; 5 GA 2 ; 2 XX NN [2] XX IN NRES 2 ; NGAP 2 ; WSIZ 5 XX AA F ; 1 AA Y ; 5 XX GA 12 ; 3 GA 10 ; 2 XX //
Important
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
contacts | Reads coordinate files and writes contact files |
dichet | Parse dictionary of heterogen groups |
psiblasts | Runs PSI-BLAST given scopalign alignments |
scopalign | Generate alignments for SCOP families |
sigscan | Scans a sparse protein signature against swissprot |