create_miRNA_profile.pl - Creates a miRNA profile
# Minimal argument call specifying all required parameters. create_miRNA_profile.pl --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --id "hsa-mir-219-1 MI0000296 Homo sapiens miR-219-2 stem-loop" --output profile.txt
# Maximum argument call specifying all possible parameters; Several different input files may be specified # Note that also the file containing the ambiguous hits may be specified create_miRNA_profile.pl --output profile.txt --min_length 15 --max_length 32 --max_mm 2 --strand RF --id "hsa-mir-219-1 MI0000296 Homo sapiens miR-219-2 stem-loop" --reference_file "/home/usr/Genomes/hsa_hairpin.fa" --min_count 2 --max_ambiguity 2 --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --input Mapping_day0_1_i_Eland_against_mature_ambiguous.txt
The input files; Several files may be specified, e.g.: --input file1 --input file2
.
The input files have to be output files of the script run_Mapping
or run_Multimapper
.
Note that unambiguously and ambiguously mapped reads may be provided for this script. Mandatory parameter
The output file. Mandatory parameter
The id of the miRNA (reference sequence) for which the profile should be created. Mandatory parameter
Only reads mapping to the specified strand will be used. Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF
The minimum length of reads. Shorter reads will not be used. default=15
The maximum length of reads. Longer reads will not be used. default=100
The maximum number of mismatches. Reads having more mismatches will not be used. default=2
The minimum count of sequences in order to be displayed in the profile. Sequences having less reads will not be used. default=1
The path to the reference sequence to which the given reads (--input
) have been mapped.
This is an optional parameter. If provided the folding of the RNA will be predicted and displayed for the forward and the reverse strand.default=undef
The maximum ambiguity of the hits. Hits having a higher ambiguity will be ignored. The ambiguity is an integer value which relates how often a read could be mapped with an equal good score (number of mismatches) to the reference sequence. Examples:
A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times, always having one mismatch, will have a ambiguity of "3".
A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".
default=5
Display the help pages.
The script creates a coverage profile for a given reference sequence. The reference sequence will usually be a miRNA but profiles may also be created for snoRNAs or whole mRNAs. The miRNA profiler creates a pseudo-graphical overview of the positions of the reads, whereas forward and reverse strand are kept separated. Addtionally the predicted RNA folding may be displayed which for example allows to estimate whether reads map to the stem or to the loop of a hairpin. Unambiguously mapped reads are depicted in uppercase letters and ambiguosly mapped reads in lowercase letters.
Mapping results of the script run_Mapping.pl
or run_Multimapper.pl
.
Note that unambiguous and ambiguous mapping results may be provided.
For example:
24688||Count=3 TACCCTGTAGATCCGAATTTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 0 F 1 128318||Count=2 TACCCTGTAGATCCGAATTTGTG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 0 F 1 150952||Count=1 TACCCTGTAGATCCTAATTTGTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 2 R 1 212857||Count=1 TACCCTGTAGATCCAAATTTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 1 F 1 317801||Count=1 TACCTTGTAGATCCGAATTTGTG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 1 F 1 389805||Count=1 TACCCTGTATATCCGAATTTGTGG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 2 F 1
Ambiguity is an important concept in the MIRO-pipeline, it is therefore crucial that this concept is properly understood. In a nutshell, ambigutiy is the number of equal good mapping positions for a single Solexa-read. Equal good in this context refers to the number of mismatches. In the MIRO-pipeline all unambiguously mapped reads have a ambiguity of "1" and they are provided in a separate output-file. All ambiguously mapped reads, on the other hand, have a ambiguity of ">=2"
Examples:
A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times, always having only one mismatch, will have a ambiguity of "3".
A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".
A pseudo-graphical representation of the positions of the reads for a given reference sequence.
If the parameter --reference_file
is not provided, the secondary RNA structure will not be calculated and displayed
For example:
FORWARD STRAND ..................................................................TGGTTCGAGACTTGCCAACT........................ 1 (amb: 1; mm: 1) ..................................................................TGGCTTTAGACTTGC............................. 1 (amb: 1; mm: 2) .................................................................GTGGTTCTAGACTTGCCAACTA....................... 1 (amb: 1; mm: 0) ................................................................GGTGGTTCTAGAATTGACAA.......................... 1 (amb: 1; mm: 2) ......................................................AACAGGATCCGGTGGTTCTAGACTTGCCAACT........................ 1 (amb: 1; mm: 0) .......................TTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 0) .......................TTGGCAATGGTAGAACT...................................................................... 1 (amb: 1; mm: 0) ......................TTTGGCAATGGGAGAACTCACACTGGTGAGGC........................................................ 1 (amb: 1; mm: 2) ......................TTTGGAAATGGTAGAACTCACACTGGTGAGGC........................................................ 1 (amb: 1; mm: 2) ......................TTTGGCAGTGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGGGAGGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 16 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGATGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGTCAATTGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAACTCCCACTGCTGAGG......................................................... 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGG......................................................... 2 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGT......................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGTGAG.......................................................... 3 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGG.............................................................. 1 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACGCACACTG............................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACCG............................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTG............................................................... 3 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTGACACT................................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACT................................................................ 19 (amb: 1; mm: 0) ......................TTTGGCAATGGTATAACTCACACT................................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGGAGAACTCACAC................................................................. 1 (amb: 1; mm: 1) ......................TTTGTCAATGGGAGAACTCACAC................................................................. 1 (amb: 1; mm: 2) ......................TTTGGCAATCGTAGAACTCACAC................................................................. 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACAC................................................................. 11 (amb: 1; mm: 0) ......................TTTGGCAATGGTATAACTCACA.................................................................. 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACA.................................................................. 4 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCA.................................................................... 9 (amb: 1; mm: 0) ......................TTTCCCAATGGTAGAACTCA.................................................................... 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAATTCA.................................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTC..................................................................... 10 (amb: 1; mm: 0) ....................AATTTGGCAATGGTAGAACTCACACTGGTGAG.......................................................... 1 (amb: 1; mm: 2) ...................ACTTTTGGCAATGGTAGAACTCACACT................................................................ 1 (amb: 1; mm: 2)
REVERSE STRAND ......CCTGCCTCTCCCCGT......................................................................................... 1 (amb: 1; mm: 2) .......TTGCTTCCCCCCGGT........................................................................................ 1 (amb: 1; mm: 2) ........AGCCTCCCCCCGTTT....................................................................................... 1 (amb: 1; mm: 1)
This example illustrates that profiles indicate the positions of unique sequences. Reads mapping to the forward and to the reverse strand are kept separated. For reads mapping to the reverse strand the reverse compliment of the actually sequenced read is displayed. Each unique sequence is displayed in a row. The number after each row is the number of reads having the displayed sequence. The numbers in brackets are the ambiguity (amb:) and the number of mismatches (mm:) Unambiguously mapped reads are displayed in uppercase letters and ambiguously mapped reads in lowercase letters.
If the parameter --reference_file
is provided, the secondary RNA structure will be calculated and displayed for the forward and the reverse strand of the reference sequence. The Vienna Package (RNAfold) will be used to predict the secondary structure of the RNA
For example:
FORWARD STRAND ..................................................................TGGTTCGAGACTTGCCAACT........................ 1 (amb: 1; mm: 1) ..................................................................TGGCTTTAGACTTGC............................. 1 (amb: 1; mm: 2) .................................................................GTGGTTCTAGACTTGCCAACTA....................... 1 (amb: 1; mm: 0) ................................................................GGTGGTTCTAGAATTGACAA.......................... 1 (amb: 1; mm: 2) ......................................................AACAGGATCCGGTGGTTCTAGACTTGCCAACT........................ 1 (amb: 1; mm: 0) .......................TTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 0) .......................TTGGCAATGGTAGAACT...................................................................... 1 (amb: 1; mm: 0) ......................TTTGGCAATGGGAGAACTCACACTGGTGAGGC........................................................ 1 (amb: 1; mm: 2) ......................TTTGGAAATGGTAGAACTCACACTGGTGAGGC........................................................ 1 (amb: 1; mm: 2) ......................TTTGGCAGTGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGGGAGGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 16 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGATGT........................................................ 1 (amb: 1; mm: 1) ......................TTTGTCAATTGTAGAACTCACACTGGTGAGGT........................................................ 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAACTCCCACTGCTGAGG......................................................... 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGG......................................................... 2 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGGTGAGT......................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTGGTGAG.......................................................... 3 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCACACTGG.............................................................. 1 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACGCACACTG............................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACCG............................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACTG............................................................... 3 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTGACACT................................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACACT................................................................ 19 (amb: 1; mm: 0) ......................TTTGGCAATGGTATAACTCACACT................................................................ 1 (amb: 1; mm: 1) ......................TTTGGCAATGGGAGAACTCACAC................................................................. 1 (amb: 1; mm: 1) ......................TTTGTCAATGGGAGAACTCACAC................................................................. 1 (amb: 1; mm: 2) ......................TTTGGCAATCGTAGAACTCACAC................................................................. 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACAC................................................................. 11 (amb: 1; mm: 0) ......................TTTGGCAATGGTATAACTCACA.................................................................. 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTCACA.................................................................. 4 (amb: 1; mm: 0) ......................TTTGGCAATGGTAGAACTCA.................................................................... 9 (amb: 1; mm: 0) ......................TTTCCCAATGGTAGAACTCA.................................................................... 1 (amb: 1; mm: 2) ......................TTTGGCAATGGTAGAATTCA.................................................................... 1 (amb: 1; mm: 1) ......................TTTGGCAATGGTAGAACTC..................................................................... 10 (amb: 1; mm: 0) ....................AATTTGGCAATGGTAGAACTCACACTGGTGAG.......................................................... 1 (amb: 1; mm: 2) ...................ACTTTTGGCAATGGTAGAACTCACACT................................................................ 1 (amb: 1; mm: 2)
..((((((..((((.((((((..(((((((.(.((((((...((((((..............))))))))))))..))))))))..)))))).))))....)).)))).. -47.1 GAGCTGCTTGCCTCCCCCCGTTTTTGGCAATGGTAGAACTCACACTGGTGAGGTAACAGGATCCGGTGGTTCTAGACTTGCCAACTATGGGGCGAGGACTCAGCCGGCAC .)))))..).)))).))))....)))))))...)))))).))).).))..............((((((((((((...(((((((....((((.(((((..(((((..... -44.9
REVERSE STRAND ......CCTGCCTCTCCCCGT......................................................................................... 1 (amb: 1; mm: 2) .......TTGCTTCCCCCCGGT........................................................................................ 1 (amb: 1; mm: 2) ........AGCCTCCCCCCGTTT....................................................................................... 1 (amb: 1; mm: 1)
The upper structure ..(((((..etc
is the predicted folding for the forward strand and the lower structure .))))..).etc
is the predicted folding for the reverse strand.
The negative numbers to the right of the folding indicate the free energy as predicted by the Vienna Package. The lower this
values the more stable the secondary structure.
Perl 5.8 or higher
Vienna Package (RNAfold, RNA.pm)
Robert Kofler
Heinz Himmelbauer
robert.kofler at crg.es