GRISOTTO motif discovery tool
Supplementary material
Get a preprint: pdf
If you use GRISOTTO in your research, please cite:
If you use RISOTTO in your research, please cite:
- Nadia Pisanti, Alexandra M. Carvalho, Laurent Marsan and Marie-France Sagot,
RISOTTO: Fast extraction of motifs with mismatches,
In J. R. Correa, A. Hevia and M. Kiwi, editors,
Proceedings of the 7th Latin American Theoretical Informatics Symposium,
volume 3887 of Lecture Notes in Computer Science, pages 757-768. Springer-Verlag, 2006.
The GRISOTTO java package can be found in two flavors:
- With the output from RISOTTO included for the 156 experiments detailed in the paper: GRISOTTO.jar
- With this package the user does not need to install RISOTTO, which is platform dependent (RISOTTO is implemented in C).
- This package only runs the 156 experiments detailed in the paper.
- With RISOTTO binaries for MAC-OSX: GRISOTTO-RISOTTO-mac.zip
- This package provides full implementation of GRISOTTO.
- A similar package can be built for Linux machines by replacing the corresponding RISOTTO Linux executables inside the risotto folder in the .zip file.
- A GRISOTTO package for Windows can be made available under request.
We call into attention that:
- GRISOTTO starts by running RISOTTO algorithm:
- A detailed explanation of how RISOTTO is used within GRISOTTO can be found in Additional file 1 (Section 1).
- The RISOTTO source code and executable can be found at RISOTTO webpage.
- After running RISOTTO motifs are ordered according to a shuffling-based statistical significance test from SMILE algorithm. Information about the source code and executable of the algorithm to perform this test can also be found at RISOTTO webpage.
Fasta and PSP files used in the 156 experiments detailed in the paper can be found at PRIORITY webpage.
To run the GRISOTTO java package:
- Use the shell (or command prompt) to run your tests. No graphical user interface is made available with the GRISOTTO implementation.
- The
.fasta
extension is used for fasta files and the .prior
extension is used for PSP files. The name of the fasta file and the name of the prior file must mach (case sensitive).
- To run GRISOTTO for the experiments presented in the paper for GRISOTTO-CDP just type:
java -jar GRISOTTO.jar -f fasta_folder -p CDP DC_prior_folder DE_prior_folder DN_prior_folder
|
- The
fasta_folder
is the folder containing the fasta files.
- The
DC_prior_folder
, DE_prior_folder
and DN_prior_folder
indicates the folders containing the DC, DE and DN priors, respectively.
- To run GRISOTTO for the experiments presented in the paper for GRISOTTO-DC just type:
java -jar GRISOTTO.jar -f fasta_folder -p DC DC_prior_folder
|
- The
fasta_folder
is the folder containing the fasta files.
- The
DC_prior_folder
indicates the folder containing the DC prior.
- To run GRISOTTO for the experiments presented in the paper for GRISOTTO-DE just type:
java -jar GRISOTTO.jar -f fasta_folder -p DE DE_prior_folder
|
- The
fasta_folder
is the folder containing the fasta files.
- The
DE_prior_folder
indicates the folder containing the DE prior.
- To run GRISOTTO for the experiments presented in the paper for GRISOTTO-DN just type:
java -jar GRISOTTO.jar -f fasta_folder -p DN DN_prior_folder
|
- The
fasta_folder
is the folder containing the fasta files.
- The
DN_prior_folder
indicates the folder containing the DN prior.
Understanding the output of GRISOTTO:
- GRISOTTO takes some seconds to load fasta and priors, please be patient until the first motif is reported.
- The output of GRISOTTO is of the following form:
==> Results: ADR1_YPD (28 sequences)
Max Motif: TAACATTG
Score: -2321.839901596452
Prior: DC prior
--> Number of motifs reported until now: 4
|
- The string
Results
identifies the fasta sequence-set just considered, followed by the number of sequences in the sequence-set.
-
Max Motif
is the reported motif.
-
Score
is the BIS score of the reported motif.
-
Prior
is the prior being used.
- Detailed results of GRISOTTO with various positional priors, sequence-set by sequence-set, can be found in Additional file 2.
- Running times of GRISOTTO can be found in Additional file 1 (Section 3.2).
- Discussion of detailed results can de found in Additional file 1 (Section 3.3).
- IUPAC conversion to PSSM:
- Proposed conversion from IUPAC to PSSM can be found in Additional file 1, as well as the metric used to compare motif discoverers (Section 2).
For any help please contact Alexandra M. Carvalho at
.