NAME

PrepareMirbase.pl - Prepares mirbase for mapping with the Miro mappers ('U'->'T')

SYNOPSIS

 # Minimal argument call, specifying all required parameters.
 PrepareMirbase.pl --input mirbase.fa --output ready_to_map.fa
 
 # Maximal argument call, specifying all possible parameters.
 PrepareMirbase.pl --input mirbase.fa --output ready_to_map.fa
                    --append_n 5 --species "hsa"

OPTIONS

--input

The input file. Has to be a multiple fasta file. Mandatory parameter

--output

The output file. Mandatory parameter

--append_n

The number of 'N' characters to append at the 5'-end and at the 3'-end of each entry. This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example

 Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
 TTGAGGTAGTAGGTTGTATAGTTA ..read
  TGAGGTAGTAGGTTGTATAGTT  ..mirbase entry

This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example:

 Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
  TTGAGGTAGTAGGTTGTATAGTTA   ..read
 NNTGAGGTAGTAGGTTGTATAGTTNN  ..mirbase entry

Now the read could be mapped to the reference sequence (let-7). default=0

--species

Extract only the entries of a certain species. You have to specify a valid mirbase shortcut (e.g.: hsa, cel, dme, ath..) default=undef

--help

Display the help pages

DESCRIPTION

General

The script prepares mirbase entries for mapping with the script run_Mapping.pl or run_Multimapper.pl. In more detail 'U' will be converted to 'T', multiple 'N's may be appended at the 5'-end and at the 3'-end of the entries (--append_n) and only the entries of a certain species may be extracted.

Input

A multiple fasta mirbase mature, hairpin or maturestar file. For example:

 >dme-miR-13b MIMAT0000119 Drosophila melanogaster miR-13b
 UAUCACAGCCAUUUUGACGAGU
 >dme-miR-14 MIMAT0000120 Drosophila melanogaster miR-14
 UCAGUCUUUUUCUCUCUCCUA
 >mmu-let-7g MIMAT0000121 Mus musculus let-7g
 UGAGGUAGUAGUUUGUACAGUU
 >mmu-let-7i MIMAT0000122 Mus musculus let-7i
 UGAGGUAGTAGUUUGUGCUGUU

Output

A multiple fasta file which may be used for mapping using run_Mapping.pl or run_Multimapper.pl.

 >mmu-let-7g MIMAT0000121 Mus musculus let-7g
 NNTGAGGTAGTAGTTTGTACAGTTNN
 >mmu-let-7i MIMAT0000122 Mus musculus let-7i
 NNTGAGGTAGTAGTTTGTGCTGTTNN

The nucleotides 'U' will be converted to 'T'
Several 'N's may be appended to the 5'-end and to the 3'-end of the sequences (--append_n).

This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example
```
 Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
 TTGAGGTAGTAGGTTGTATAGTTA ..read
  TGAGGTAGTAGGTTGTATAGTT  ..mirbase entry
```
This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example:
```
 Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
  TTGAGGTAGTAGGTTGTATAGTTA   ..read
 NNTGAGGTAGTAGGTTGTATAGTTNN  ..mirbase entry
```
Only the sequences of a certain species may be extracted. The proper mirbase shortcuts have to be specified. For example. hsa = Homo sapiens, mmu = Mus musculus etc

REQUIREMENTS

Perl 5.8 or higher

AUTHORS

Robert Kofler

Heinz Himmelbauer

CONTACT

robert.kofler at crg.es