NAME

The MIRO pipeline - Analysis of microRNAs using short-read deep sequencing data.

Introduction

The MIRO (the miRNA omics) pipeline is a flexible and powerful tool for the analysis of miRNA (or more generall short RNA) expression using short-read deep sequencing data. In its present implementation MIRO is especially adapted for the analysis of reads generated with the Illumina sequencing platform. MIRO allows to preprocess the Solexa-reads, map them flexibly to several reference genomes using one of four different mappers, create differential gene (miRNA) expression profiles and cluster reads using one of several algorithm. MIRO output is furthermore compatible with software such as genome browsers and miRDeep.

Download

Dowload Miro release candidate 2.10: Miro_rc2.10.zip

Documentation

The MIRO documentation

License

Miro is available free for academic use. For a commercial licence please contact: heinz.himmelbauer at crg.es

In any case please cite:

Robert Kofler, Manuela Hummel, Juliane Dohm, Matthew Ingham, Lauro Sumoy and Heinz Himmelbauer (2009), MIRO: Analysis of microRNAs using short-read deep-sequencing data; submitted

Installation

  1. Download MIRO: see above
  2. Unpack the file in the folder of your choosing
  3. Edit the file log/pipelog_config.txt and set the path to the MIRO log files. This step is optional.
  4. Install the necessary prerequisites. For the full functionallity of MIRO you need: Perl 5.8 or higher, R 2.7.0 or higher, R-library geneplotter, R-library gplots, WebLogo 3.0, RNAfold, One of the following mappers: Eland, SeqMap, SOAP, Gem
  5. Have a look at the waltkthrough (below)

Walk-through

Install Miro

Install Miro as described above

Download the test data

To test the MIRO pipeline download the publicly available data published by Glazov et al. (2008) The test data set

Glazov EA, Cottee PA, Barris WC, Moore RJ et al. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 2008 Jun;18(6):957-64. PMID: 18469162

Convert test data into 'seq' format:

MIRO accepts as input a '*_seq.txt' file which is the standard output of the Illumina pipeline. An example:
1       1       26      688     AAAACACCAACAAAACAACCAAAAATAATACAACAA
1       1       117     645     GCACCAACAACAAGCAAAAAACGACTAAACACACAA
1       1       391     248     GTAAGCACTCCCCTATCCTGTCAGTTGCCTAGTATA
1       1       24      746     ACTCAACACGAAACAAACCAAAACGACAAAAACACA

Use the following script to convert the three files in the test data set into a '*_seq.txt' file: glazov2seq.pl.txt (Note: please remove the .txt extension from the script. Due to security concerns Foswiki does not allow the extension '.pl').

./glazov2seq.pl GSM270187.txt  > ce5.seq
./glazov2seq.pl GSM270188.txt  > ce7.seq  
./glazov2seq.pl GSM270189.txt  > ce9.seq

Run Preprocessing

run the script: "run_Preprocessing.pl" for all three files in the test data set. An example:

perl ~/programs/Miro/run_Preprocessing.pl --input ce5.seq --output ce5.pp --logfile ce5.log --max_N 1 --max_length 32 --trim 1
perl ~/programs/Miro/run_Preprocessing.pl --input ce7.seq --output ce7.pp --logfile ce7.log --max_N 1 --max_length 32 --trim 1
perl ~/programs/Miro/run_Preprocessing.pl --input ce9.seq --output ce9.pp --logfile ce9.log --max_N 1 --max_length 32 --trim 1

Download mature miRNAs and hairpin sequences from miRBase and prepare them for mapping

Download the mature miRNAs (mature.fa.gz) and the hairpin sequences (hairpin.fa.gz) from miRBase: http://www.mirbase.org/ftp.shtml

Prepare them for mapping using the script PrepareMirbase.pl. PrepareMirbase converts the nucleotide 'U' to 'T' and extracts only the entries for one particular species. The files mirbase.fa and hairpin.fa contain entries for all species present in miRBase.

In miRBase the shortcut 'gga' is used for Gallus gallus

perl PrepareMirbase.pl --input mature.fa --output gga-mature.fa --species gga
perl PrepareMirbase.pl --input hairpin.fa --output gga-hairpin.fa --species gga

Map the preprocesed reads to the mature miRNAs and hairpins

Map the preprocessed reads from the three libraries (CE5, CE7 and CE9) to the miRBase reference sequences. In the following example the reads are fist mapped to the mature miRNAs and the no-matches (sequences not mapping to the mature miRNAs) are subsequently mapped to the hairpin sequences.

Note that mapping requires a configuration file ("Settings/map-gga.txt"). As an example I used the following file for this analysis: map-gga.txt

perl run_Mapping.pl Settings/map-gga.txt

-> Repeat this step for all three samples (CE5, CE7, CE9). Use the same output directory for all three samples!!

Calculate differential expression of miRNAs

Using MIRO it is very simple to calculate a differential miRNA expression profile. You just need to run the script 'run_DGE.pl' and specify the mapping output folder.

perl run_DGE.pl --bayes_table --bayes_pairwise --idorder "ce5 ce7 ce9" /home/robert/analysis/miro-test/map-gga

This script creates a subfolder containing the analysis for each normalisation and reference sequence used for mapping. In our example two reference sequences have been used (mature and hairpin) and per default three normalisations are used (off = no normalization, quantile and scale-linear), Miro is thus going to create six subfolders. Significant differentially expressed miRNAs can be found in the folder containing the not normalized results as the bayesian method used by MIRO does not depend on the normalization.

Start complementing analysis

Using the mapping output you can also perform additional analysis, like creating a miRNA profile or a miRNA-logo. Details can be found in the Miro documentation: The MIRO documentation

Bugs

Please report bugs to the email address shown below!

Friend Projects

The GEM-Mapper

Contact

heinz.himmelbauer at crg.es

-- Main.RobertKofler - 27 Sep 2009

Topic attachments
I Attachment Action Size Date Who Comment
zipzip Miro_rc2.10.zip manage 301.7 K 01 Apr 2010 - 18:47 Main.RobertKofler  
txttxt glazov2seq.pl.txt manage 0.3 K 04 Nov 2009 - 18:38 Main.RobertKofler  
txttxt map-gga.txt manage 5.3 K 04 Nov 2009 - 18:38 Main.RobertKofler  

 
 Ultrasequencing Unit
Centre for Genomic Regulation (CRG)
Barcelona, Spain