RISO motif discovery tool

Supplementary material

Alexandra M. Carvalho, Ana T. Freitas, Arlindo L. Oliveira and Marie-France Sagot

RISO online version
Download
Compile and run
Set up the input
Interpret the output
Assess statistical significance
An example

If you use RISO in your research, please cite:

Alexandra M. Carvalho, Ana T. Freitas, Arlindo L. Oliveira and Marie-France Sagot, An efficient algorithm for the identification of structured motifs in DNA promoter sequences, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(2):126-140, Apr-Jun, 2006.

Alexandra M. Carvalho, Ana T. Freitas, Arlindo L. Oliveira and Marie-France Sagot, A highly scalable algorithm for the extraction of cis-regulatory regions, In Yi-Ping Phoebe Chen and Limsoon Wong, editors, Proceedings of the 3rd Asia Pacific Bioinformatics Conference, volume 1 of Advances in Bioinformatics and Computational Biology, pages 273-282. Imperial College Press, 2005.

If you use RISOTTO in your research, please cite:

Nadia Pisanti, Alexandra M. Carvalho, Laurent Marsan and Marie-France Sagot, RISOTTO: Fast extraction of motifs with mismatches, In J. R. Correa, A. Hevia and M. Kiwi, editors, Proceedings of the 7th Latin American Theoretical Informatics Symposium, volume 3887 of Lecture Notes in Computer Science, pages 757-768. Springer-Verlag, 2006.

Download

Three implementations of RISO were developed:

RISO-Static: only for dyads (Linux executable [bin] and sources [src]).
RISO-Dynamic: generic implementation compatible with SMILE statistical significance assessement (Linux executable [bin] and sources [src]).
RISOTTO: RISO-Dynamic with maximum extensibility improvement (Linux executable [bin] and sources [src]).

Compile and run

To compile RISO:

Download RISO sources from Download.
Unzip the downloaded file.
Go to RISO bin directory.
Use the Makefile (by typing make in your terminal).

To run RISO:

Follow instructions given in An example.

Set up the input

RISO discovers motifs composed of many binding sites separated by spacers. Each binding site is called a box.

Generic parameters:

Alphabet file: the file name with the alphabet of the motifs (examples are the DNA or protein alphabet).
FASTA file: the file name of the input sequences (sequences must be in FASTA format).
Output file: the name of the output file.
Quorum: the mimimum percentage of input sequences where the motif must appear to be extracted.
Boxes: number of boxes of the motif.

For each box of the motif it is also needed:

Min length: minimum length of the corresponding box.
Max length: maximum length of the corresponding box.
Substitutions: maximum number of substitutions allowed in the corresponding box.
Min spacer length: minimum distance that separates the corresponding box to the next one (if exists).
Max spacer length: maximum distance that separates the corresponding box to the next one (if exists).

An example of the input parameters follows:

                                                   
    Alphabet file        ../params/alphabet

FASTA file           ../params/dnc_subtilis_330-30.seq

Output file          ../params/b-subtilis-output

Quorum               12

Boxes                2



BOX 1

Min length           6

Max length           6

Substitutions        1

Min spacer length    16

Max spacer length    18


BOX 2

Min length           6

Max length           6

Substitutions        1

Interpret the output

For each motif that RISO-Dynamic discovers in the input dataset, it prints the following information:

An header with a summury of the input parameters (number of boxes, absolute quorum/number of sequences, total number of symbols in the input sequences, total min length, total max length, total substitutions, description given for each box in the input parameters, and the alphabet).
Each of the subsequent lines prints the following information:

The motif.
A numeric representation of the motif.
The number of different sequences where the motif appears.
The number of occurrences of the motif (allowing repeats and overlaps in the same sequence).

The number of total motifs founded.
The time (in seconds) spent by the motif extraction.

RISO-Static outputs only the motifs and the number of different sequences where the motifs appears.

An example of the output follows:



%%% 2 128/1062 196736 12 12 2 6 6 1 16 18 6 6 1 alphabet ACGT$                 

                                                                           

==============================================================

AAAAAA_AAAAAA 000000-000000 188	703

AAAAAA_AAAAAC 000000-000001 139	357

AAAAAA_AAAAAG 000000-000002 166	410

AAAAAA_AAAAAT 000000-000003 193	461



...................................



TTTTTT_TTTTTA 333333-333330 201	507

TTTTTT_TTTTTC 333333-333331 171	441

TTTTTT_TTTTTG 333333-333332 148	384

TTTTTT_TTTTTT 333333-333333 188	700



Nb models: 6419

User time: 45.33 sec.

Assess statistical significance

RISO-Dynamic is compatible with SMILE statistical significance application. To assess statistical significance within RISO-Dynamic, you have to download the SMILE algorithm, and put the statistical significance applications (e-smile_shuffling and e-smile_against) in the RISO-Dynamic bin directory. An example of statistical assessement with RISO-Dynamic can be found in An example.

If you assess statistical significance of the motifs extracted by RISO-Dynamic in your research, please also cite:

L. Marsan and M.-F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7(3-4):345.362, 2000.

An example

Go to RISO params directory.

Download the input files to that directory:

The input parameters for RISO: b-subtilis-input.
The input sequences: dnc_subtilis_330-30.seq.
The alphabet file: alphabet.

Type in your terminal:

riso b-subtilis-input

Two output files should be created:

The output of RISO: b-subtilis-output.
The output of statistical significance evaluation: b-subtilis-output.shuffle.

For any help please contact Alexandra M. Carvalho at .