A description of the input and output formats with examples can be found here.
Configuration parameters
The following parameters are provided to adjust the consideration of alternative start sites for a putative gene. The parameter sigma is used to smooth the positional probability Markov Models.
Search
Range to be searched around putative gene starts for alternative start sites. I. e. by the search range the maximum
distance to a predicted TIS as derived from the input file is defined. In this range all potential start sites are
considered as candidate TIS. A potential start site is defined as start codon, that shares the same reading frame
of the respective gene, with no inframe stop codon between the start codon and the annotated stop.
At first the initially predicted TIS is labeled as strong TIS, the alternative start sites are labeled as
weak TIS. During the iterative classification, the label strong is assigned to the candidate start with the highest
PWM-Score.
- up
- Specifies the maximal distance to a given start position for upstream (5') alternative starts.
Default: 250 nucleotides
Minimum: 50 nucleotides
Maximum: 250 nucleotides - down
- Specifies the maximal distance to a given start position for downstream (3') alternative starts.
Default: 250 nucleotides
Minimum: 50 nucleotides
Maximum: 250 nucleotides
View an Illustration of the search range.
Extract
Range to be extracted around each candidate start site. The resulting sequence window is used for the unsupervised learning. It is assumed to contain the characteristics of respective start site, e. g. the ribosom binding site.
- up
- Specifies the number of nucleotides to be be extracted upstream (5') a given start position.
Default: 30 nucleotides
Minimum: 10 nucleotides
Maximum: 100 nucleotides - down
- Specifies th number of nucleotides to be extracted downstream (3') a given start position.
Default: 30 nucleotides
Minimum: 10 nucleotides
Maximum: 100 nucleotides
View an Illustration of the extract range.
Sigma
The standard deviation parameter sigma of the Gaussian density specifies the smoothing of the
positional probabilities of the second order Markov Models. A high value for sigma means the
positional probabilities are highly smoothed.
The parameter doesn't imply any assumptions on trinucleotide positions in the sequence, but adapts
the estimation to a varying number of genes under consideration. The default value 0.5 works
well with approximately 4000 genes. For a set with a smaller number of genes it may be useful to chose a
higher value for sigma to prevent vanishing probabilities.
Range: 0.1 - 2.0
Default: 0.5
Automated Sigma Optimization
Since TiCo release 2.0 the smoothing parameter sigma can be optimized by an automated routine using the ROC (Receiver Operating Characteristics) score. A detailed description will be given in the coming publication.
Value: checked or unchecked
Default: checked
Minimum gene length
The minimum length of a gene after reannotation of the TIS (denoted in bp). If the distance of a potential candidate TIS falls below the minimum length it is omitted from the list of candidates.
Default: 60 nucleotides
Starts
The start codons to be considered as alternative start sites within the search window.
Default: ATG GTG TTG
(Cannot be altered in the current version of the webinterface)
Stops
The codons to be assumed as stop codon.
Default: TAA TAG TGA
(Cannot be altered in the current version of the webinterface)