PrepareMirbase.pl - Prepares mirbase for mapping with the Miro mappers ('U'->'T')
# Minimal argument call, specifying all required parameters. PrepareMirbase.pl --input mirbase.fa --output ready_to_map.fa # Maximal argument call, specifying all possible parameters. PrepareMirbase.pl --input mirbase.fa --output ready_to_map.fa --append_n 5 --species "hsa"
The input file. Has to be a multiple fasta file. Mandatory parameter
The output file. Mandatory parameter
The number of 'N' characters to append at the 5'-end and at the 3'-end of each entry. This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example
Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read TGAGGTAGTAGGTTGTATAGTT ..mirbase entry
This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example:
Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read NNTGAGGTAGTAGGTTGTATAGTTNN ..mirbase entry
Now the read could be mapped to the reference sequence (let-7). default=0
Extract only the entries of a certain species. You have to specify a valid mirbase shortcut (e.g.: hsa, cel, dme, ath..) default=undef
Display the help pages
The script prepares mirbase entries for mapping with the script run_Mapping.pl
or run_Multimapper.pl
.
In more detail 'U' will be converted to 'T',
multiple 'N's may be appended at the 5'-end and at the 3'-end of the entries (--append_n
)
and only the entries of a certain species may be extracted.
A multiple fasta mirbase mature, hairpin or maturestar file. For example:
>dme-miR-13b MIMAT0000119 Drosophila melanogaster miR-13b UAUCACAGCCAUUUUGACGAGU >dme-miR-14 MIMAT0000120 Drosophila melanogaster miR-14 UCAGUCUUUUUCUCUCUCCUA >mmu-let-7g MIMAT0000121 Mus musculus let-7g UGAGGUAGUAGUUUGUACAGUU >mmu-let-7i MIMAT0000122 Mus musculus let-7i UGAGGUAGTAGUUUGUGCUGUU
A multiple fasta file which may be used for mapping using run_Mapping.pl
or run_Multimapper.pl
.
>mmu-let-7g MIMAT0000121 Mus musculus let-7g NNTGAGGTAGTAGTTTGTACAGTTNN >mmu-let-7i MIMAT0000122 Mus musculus let-7i NNTGAGGTAGTAGTTTGTGCTGTTNN
The nucleotides 'U' will be converted to 'T'
Several 'N's may be appended to the 5'-end and to the 3'-end of the sequences (--append_n
).
This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example
Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read TGAGGTAGTAGGTTGTATAGTT ..mirbase entry
This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example:
Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read NNTGAGGTAGTAGGTTGTATAGTTNN ..mirbase entry
Only the sequences of a certain species may be extracted. The proper mirbase shortcuts have to be specified. For example. hsa = Homo sapiens, mmu = Mus musculus etc
Perl 5.8 or higher
Robert Kofler
Heinz Himmelbauer
robert.kofler at crg.es