![]() |
EMBOSS: cons |
The residue (or nucleotide) i in an alignment column, is compared to all other residues (j) in the column. The score for i is the sum over all residues j (not i=j) of the score(ij)*weight(j) . Where score(ij) is taken from a nucleotide or protein scoring matrix (see -datafile qualifier) and the "weight(j)" is the weighting given to the sequence j, which is given in the alignment file.
The highest scoring type of residue is then found in the column. If the number of positive matches for this residue is greater than the "plurality value" then this residue is the consensus. The positive matches for a residue i are calculated as being the sum of weights of all the residues that increase the score of residue i (i.e. positive).
Where no consensus is found at a position i, an 'n' or an 'x' character is output; (depending on it being a DNA or protein sequence).
The "plurality" qualifier allows the user to set a cut-off for the number of positive matches below which there is no consensus.
The "identity" qualifier provides the facility of setting the required number of identities at a site for it to give a consensus at that position. Therefore, if this is set to the number of sequences in the alignment only columns of identities contribute to the consensus.
The "setcase" qualifier sets the threshold for the positive matches above which the consensus is is upper-case and below which the consensus is in lower-case.
% cons Creates a consensus from multiple alignments Input sequence set: aligned.fasta Output file [outfile.cons]: aligned.cons
Mandatory qualifiers: [-msf] seqset File containing a sequence alignment. [-outseq] seqout Output sequence USA Optional qualifiers: -datafile matrix Scoring matrix -plurality float Set a cut-off for the number of positive matches below which there is no consensus. The default plurality is taken as half the total weight of all the sequences in the alignment. -setcase float Sets the threshold for the positive matches above which the consensus is is upper-case and below which the consensus is in lower-case. -identity integer Provides the facility of setting the required number of identities at a site for it to give a consensus at that position. Therefore, if this is set to the number of sequences in the alignment only columns of identities contribute to the consensus. -name string Name of the consensus sequence Advanced qualifiers: (none) General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-msf] (Parameter 1) |
File containing a sequence alignment. | Readable sequences | Required |
[-outseq] (Parameter 2) |
Output sequence USA | Writeable sequence | <sequence>.format |
Optional qualifiers | Allowed values | Default | |
-datafile | Scoring matrix | Comparison matrix file in EMBOSS data path | EBLOSUM62 for protein EDNAFULL for DNA |
-plurality | Set a cut-off for the number of positive matches below which there is no consensus. The default plurality is taken as half the total weight of all the sequences in the alignment. | Any integer value | Half the total sequence weighting |
-setcase | Sets the threshold for the positive matches above which the consensus is is upper-case and below which the consensus is in lower-case. | Any integer value | 0 |
-identity | Provides the facility of setting the required number of identities at a site for it to give a consensus at that position. Therefore, if this is set to the number of sequences in the alignment only columns of identities contribute to the consensus. | Integer 0 or more | 0 |
-name | Name of the consensus sequence | Any string is accepted | An empty string is accepted |
Advanced qualifiers | Allowed values | Default | |
(none) |
>EMBOSS_001 tagctgacctgacgggactgatgcgt
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
megamerger | Merge two large overlapping nucleic acid sequences |
merger | Merge two overlapping nucleic acid sequences |