The MIRO pipeline - Analysis of microRNAs using short-read deep sequencing data.
The MIRO (the miRNA omics) pipeline is a flexible and powerful tool for the analysis of miRNA (or more generall short RNA) expression using short-read deep sequencing data. In its present implementation MIRO is especially adapted for the analysis of reads generated with the Illumina sequencing platform. MIRO allows to preprocess the Solexa-reads, map them flexibly to several reference genomes using one of four different mappers, create differential gene (miRNA) expression profiles and cluster reads using one of several algorithm. MIRO output is furthermore compatible with software such as genome browsers and miRDeep.
Dowload Miro release candidate 2.10: Miro_rc2.10.zip
The MIRO documentation
Miro is available free for academic use. For a commercial licence please contact: heinz.himmelbauer at crg.es
In any case please cite:
Robert Kofler, Manuela Hummel, Juliane Dohm, Matthew Ingham, Lauro Sumoy and Heinz Himmelbauer (2009), MIRO: Analysis of microRNAs using short-read deep-sequencing data; submitted
- Download MIRO: see above
- Unpack the file in the folder of your choosing
- Edit the file log/pipelog_config.txt and set the path to the MIRO log files. This step is optional.
- Install the necessary prerequisites. For the full functionallity of MIRO you need: Perl 5.8 or higher, R 2.7.0 or higher, R-library geneplotter, R-library gplots, WebLogo 3.0, RNAfold, One of the following mappers: Eland, SeqMap, SOAP, Gem
- Have a look at the waltkthrough (below)
Install Miro as described above
Download the test data
To test the MIRO pipeline download the publicly available data published by Glazov et al. (2008) The test data set
Glazov EA, Cottee PA, Barris WC, Moore RJ et al. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 2008 Jun;18(6):957-64. PMID: 18469162
Convert test data into 'seq' format:
MIRO accepts as input a '*_seq.txt' file which is the standard output of the Illumina pipeline. An example:
1 1 26 688 AAAACACCAACAAAACAACCAAAAATAATACAACAA
1 1 117 645 GCACCAACAACAAGCAAAAAACGACTAAACACACAA
1 1 391 248 GTAAGCACTCCCCTATCCTGTCAGTTGCCTAGTATA
1 1 24 746 ACTCAACACGAAACAAACCAAAACGACAAAAACACA
Use the following script to convert the three files in the test data set into a '*_seq.txt' file: glazov2seq.pl.txt
(Note: please remove the .txt extension from the script. Due to security concerns Foswiki does not allow the extension '.pl').
./glazov2seq.pl GSM270187.txt > ce5.seq
./glazov2seq.pl GSM270188.txt > ce7.seq
./glazov2seq.pl GSM270189.txt > ce9.seq
run the script: "run_Preprocessing.pl" for all three files in the test data set. An example:
perl ~/programs/Miro/run_Preprocessing.pl --input ce5.seq --output ce5.pp --logfile ce5.log --max_N 1 --max_length 32 --trim 1
perl ~/programs/Miro/run_Preprocessing.pl --input ce7.seq --output ce7.pp --logfile ce7.log --max_N 1 --max_length 32 --trim 1
perl ~/programs/Miro/run_Preprocessing.pl --input ce9.seq --output ce9.pp --logfile ce9.log --max_N 1 --max_length 32 --trim 1
Download mature miRNAs and hairpin sequences from miRBase and prepare them for mapping
Download the mature miRNAs (mature.fa.gz) and the hairpin sequences (hairpin.fa.gz) from miRBase: http://www.mirbase.org/ftp.shtml
Prepare them for mapping using the script PrepareMirbase.pl. PrepareMirbase converts the nucleotide 'U' to 'T' and extracts only the entries for one particular species. The files mirbase.fa and hairpin.fa contain entries for all species present in miRBase.
In miRBase the shortcut 'gga' is used for Gallus gallus
perl PrepareMirbase.pl --input mature.fa --output gga-mature.fa --species gga
perl PrepareMirbase.pl --input hairpin.fa --output gga-hairpin.fa --species gga
Map the preprocesed reads to the mature miRNAs and hairpins
Map the preprocessed reads from the three libraries (CE5, CE7 and CE9) to the miRBase reference sequences. In the following example the reads are fist mapped to the mature miRNAs and the no-matches (sequences not mapping to the mature miRNAs) are subsequently mapped to the hairpin sequences.
Note that mapping requires a configuration file ("Settings/map-gga.txt"). As an example I used the following file for this analysis: map-gga.txt
perl run_Mapping.pl Settings/map-gga.txt
-> Repeat this step for all three samples (CE5, CE7, CE9). Use the same output directory for all three samples!!
Calculate differential expression of miRNAs
Using MIRO it is very simple to calculate a differential miRNA expression profile. You just need to run the script 'run_DGE.pl' and specify the mapping output folder.
perl run_DGE.pl --bayes_table --bayes_pairwise --idorder "ce5 ce7 ce9" /home/robert/analysis/miro-test/map-gga
This script creates a subfolder containing the analysis for each normalisation and reference sequence used for mapping. In our example two reference sequences have been used (mature and hairpin) and per default three normalisations are used (off = no normalization, quantile and scale-linear), Miro is thus going to create six subfolders. Significant differentially expressed miRNAs can be found in the folder containing the not normalized results as the bayesian method used by MIRO does not depend on the normalization.
Start complementing analysis
Using the mapping output you can also perform additional analysis, like creating a miRNA profile or a miRNA-logo. Details can be found in the Miro documentation: The MIRO documentation
Please report bugs to the email address shown below!
heinz.himmelbauer at crg.es
-- Main.RobertKofler - 27 Sep 2009