RISO motif discovery tool
Supplementary material
If you use RISO in your research, please cite:
- Alexandra M. Carvalho, Ana T. Freitas, Arlindo L. Oliveira and Marie-France Sagot,
A highly scalable algorithm for the extraction of cis-regulatory regions,
In Yi-Ping Phoebe Chen and Limsoon Wong, editors,
Proceedings of the 3rd Asia Pacific Bioinformatics Conference,
volume 1 of Advances in Bioinformatics and Computational Biology, pages 273-282. Imperial College Press, 2005.
If you use RISOTTO in your research, please cite:
- Nadia Pisanti, Alexandra M. Carvalho, Laurent Marsan and Marie-France Sagot,
RISOTTO: Fast extraction of motifs with mismatches,
In J. R. Correa, A. Hevia and M. Kiwi, editors,
Proceedings of the 7th Latin American Theoretical Informatics Symposium,
volume 3887 of Lecture Notes in Computer Science, pages 757-768. Springer-Verlag, 2006.
Three implementations of RISO were developed:
- RISO-Static: only for dyads (Linux executable [bin] and sources [src]).
- RISO-Dynamic: generic implementation compatible with SMILE statistical significance assessement (Linux executable [bin] and sources [src]).
- RISOTTO: RISO-Dynamic with maximum extensibility improvement (Linux executable [bin] and sources [src]).
To compile RISO:
- Download RISO sources from Download.
- Unzip the downloaded file.
- Go to RISO bin directory.
- Use the Makefile (by typing make in your terminal).
To run RISO:
RISO discovers motifs composed of many binding sites separated by spacers. Each binding site is called a box.
Generic parameters:
- Alphabet file: the file name with the alphabet of the motifs (examples are the DNA or protein alphabet).
- FASTA file: the file name of the input sequences (sequences must be in FASTA format).
- Output file: the name of the output file.
- Quorum: the mimimum percentage of input sequences where the motif must appear to be extracted.
- Boxes: number of boxes of the motif.
For each box of the motif it is also needed:
- Min length: minimum length of the corresponding box.
- Max length: maximum length of the corresponding box.
- Substitutions: maximum number of substitutions allowed in the corresponding box.
- Min spacer length: minimum distance that separates the corresponding box to the next one (if exists).
- Max spacer length: maximum distance that separates the corresponding box to the next one (if exists).
An example of the input parameters follows:
Alphabet file        ../params/alphabet
FASTA file           ../params/dnc_subtilis_330-30.seq
Output file          ../params/b-subtilis-output
Quorum               12
Boxes                2
BOX 1
Min length           6
Max length           6
Substitutions        1
Min spacer length    16
Max spacer length    18
BOX 2
Min length           6
Max length           6
Substitutions        1
|
For each motif that RISO-Dynamic discovers in the input dataset, it prints the following information:
- An header with a summury of the input parameters (number of boxes, absolute quorum/number of sequences, total number of symbols in the input sequences, total min length, total max length, total substitutions, description given for each box in the input parameters, and the alphabet).
- Each of the subsequent lines prints the following information:
- The motif.
- A numeric representation of the motif.
- The number of different sequences where the motif appears.
- The number of occurrences of the motif (allowing repeats and overlaps in the same sequence).
- The number of total motifs founded.
- The time (in seconds) spent by the motif extraction.
RISO-Static outputs only the motifs and the number of different sequences where the motifs appears.
An example of the output follows:
%%% 2 128/1062 196736 12 12 2 6 6 1 16 18 6 6 1 alphabet ACGT$
==============================================================
AAAAAA_AAAAAA 000000-000000 188 703
AAAAAA_AAAAAC 000000-000001 139 357
AAAAAA_AAAAAG 000000-000002 166 410
AAAAAA_AAAAAT 000000-000003 193 461
...................................
TTTTTT_TTTTTA 333333-333330 201 507
TTTTTT_TTTTTC 333333-333331 171 441
TTTTTT_TTTTTG 333333-333332 148 384
TTTTTT_TTTTTT 333333-333333 188 700
Nb models: 6419
User time: 45.33 sec.
|
RISO-Dynamic is compatible with SMILE statistical significance application. To assess statistical significance within RISO-Dynamic, you have to download the SMILE algorithm, and put the statistical significance applications (e-smile_shuffling and e-smile_against) in the RISO-Dynamic bin directory. An example of statistical assessement with RISO-Dynamic can be found in An example.
If you assess statistical significance of the motifs extracted by RISO-Dynamic in your research, please also cite:
- L. Marsan and M.-F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7(3-4):345.362, 2000.
Go to RISO params directory.
Download the input files to that directory:
Type in your terminal:
Two output files should be created:
For any help please contact Alexandra M. Carvalho at
.