
Introduction
MCAST
searches a sequence database for
statistically significant clusters of non-overlapping "hits" to the
motifs in a query.
A "hit" is a sequence position that is sufficiently similar to a
motif in the query. To be a hit, the p-value of the motif
alignment score must be less than the significance threshold,
pthresh (see optiontal input p-value threshold, below).
The alignment of the motif
and the sequence position is done without gaps. To compute
the p-value of a motif alignment score, MCAST
assumes that the sequences in the database were generated by a
0-order Markov process.
MCAST
searches for hits on both the sequences given in
the database, and their reverse complements.
A cluster of non-overlapping hits is called a "match". The user can specify the maximum allowed distance between the hits in a match (see optional input maximum motif gap below). Two hits separated by more than the maximum allowed gap will be reported in separate matches.
MCAST
searches for all of the matches between the
query and the sequences in the database. Each match is assigned an
E-value, and matches that score below an E-value
threshold are printed in order of increasing E-value (see
optional input E-value threshold below).
The p-value of a hit is converted to a "p-score" in order to compute the total score of the match it participates in. The p-score for a hit with p-value p is
S = -log2(p/pthresh),
where the significance threshold pthresh may be specified by
the user (see optional input motif p-threshold below.
The total score of a match is the sum of the p-scores of
the hits making up the match. MCAST
finds the matches
with the maximum match scores.
In order for E-values to be computed by
MCAST
, at least 100 matches must be found. If there
are too few sequences in the database, or if certain other options
are made to stringent (see Options, below), too few matches may
exist for E-values to be computed. In this case, the results
are sorted by match score, the E-value column is set to
"NaN" and all matches are printed.
A full description of the algorithm is found in:
Required MCAST Inputs
Three inputs must be provided on the MCAST web page:- An e-mail address where the notification of job completion can be sent. You specify the e-mail address in the two text boxes labeled "e-mail address". The e-mail address must be entered twice to reduce the amount of undeliverable mail caused by typographic errors.
- A MEME output file, containing the descriptions of one or more motifs. You can select a file to be uploaded from your computer by clicking on the "Browse ..." button under the "Your motif file" label.
- A sequence database to be searched. You can choose a sequence file to be uploaded from your computer bu clicking on the "Browse ..." button under the "Your FASTA sequence file" label. Alternatively, you can select one of the supported databases maintained on the MEME Suite web site: first select the category of the sequence database from the "Category" drop-down list, then choose one of the supported databases listed in the "Database" drop-down list.
Optional MCAST inputs
The MCAST web page accepts four optional inputs:- A threshold p-value. Motif occurrences with p-values below the threshold
will not be considered in scoring matches (defaults to
5e-4
). - A motif gap maximum. The maximum allowed distance between adjacent motif hits in a match (defaults to 50).
- The E-value threshold. Matches whose E-value is less then the threshold will not be reported (defaults to 10).
- The pseudocount weight. A pseudocount is added to each count in the motif matrix. The pseudocount is determined by multiplying the background frequency by this weight (defaults to 4).