create_easy_screenable.pl - Creates two files which allow a fast manual screening of the hit positions and their clustering
# Minimal argument call specifying all required parameters. create_easy_screenable.pl --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --output screenable_day0
# Maximum argument call specifying all possible parameters; Several different input files may be specified # Note that also the file containing the ambiguous hits may be specified create_easy_screenable.pl --output screenable_day0 --min_length 15 --max_length 32 --max_mm 2 --strand RF --max_ambiguity 2 --tempdir "/tmp" --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --input Mapping_day0_1_i_Eland_against_mature_ambiguous.txt
The input files; Several files may be specified, e.g.: --input file1 --input file2
.
The input files have to be output files of the script run_Mapping
or run_Multimapper
.
Note that unambiguously and ambiguously mapped reads may be provided for this script. Mandatory parameter.
The output prefix; This script creates two output files, one file will have the extension .pos
the other .seq
. Mandatory parameter
Only reads mapping to the specified strand will be used. Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF
The minimum length of reads. Shorter reads will be ignored. default=15
The maximum length of reads. Longer reads will be ignored. default=100
The maximum number of mismatches. Reads having more mismatches will not be used. default=2
The minimum number of reads mapping to a position. default=1
The maximum ambiguity of the hits. Hits having a higher ambiguity will be ignored. The ambiguity is an integer value which relates how often a read could be mapped with an equal good score (number of mismatches) to the reference sequence. Examples:
A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times, always having one mismatch, will have a ambiguity of "3".
A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".
default=5
Display the help pages.
This script creates the easy manually screenable .pos
and .seq
files.
In the .pos
files reads having the same start position are aggregated and sorted. This allows for an easy and fast manual identification of
intersting features.
The .seq
files contain more detailed information such as the actual sequences of the reads.
We therefore recommend to primarily screen the .pos
file.
When an interesting feature has been found, the feature may be further investigated using the .seq
file
Mapping results of the script run_Mapping.pl
or run_Multimapper.pl
.
Note that unambiguous and ambiguous mapping results may be provided.
For example:
24688||Count=3 TACCCTGTAGATCCGAATTTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 0 F 1 128318||Count=2 TACCCTGTAGATCCGAATTTGTG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 0 F 1 150952||Count=1 TACCCTGTAGATCCTAATTTGTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 2 R 1 212857||Count=1 TACCCTGTAGATCCAAATTTGT hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 1 F 1 317801||Count=1 TACCTTGTAGATCCGAATTTGTG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 1 F 1 389805||Count=1 TACCCTGTATATCCGAATTTGTGG hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a 1 2 F 1
Ambiguity is an important concept in the MIRO-pipeline, it is therefore crucial that this concept is properly understood. In a nutshell, ambigutiy is the number of equal good mapping positions for a single Solexa-read. Equal good in this context refers to the number of mismatches. In the MIRO-pipeline all unambiguously mapped reads have a ambiguity of "1" and they are provided in a separate output-file. All ambiguously mapped reads, on the other hand, have a ambiguity of ">=2"
Examples:
A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times, always having only one mismatch, will have a ambiguity of "3".
A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".
The script will generate two easy screenable output files. A position (.pos) file and a sequence (.seq) file.
The position files allow a fast and convenient manual screening of the hit-positions.
Reads having the same start position are aggregated and sorted according to the start position.
Different reference sequences are separated by three successive empty rows whereas the forward and the reverse strand of the same
reference sequence is only separated by a single empty row.
This allows to quickly estimate the position and shape of read-clusters and addtionally the amount of antisense transcription.
If more details are required use the .seq
files.
Following an example of reads mapping to H. sapiens hairpins:
hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 51 21 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 42 1 1 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 46 2 2 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 154 21 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 48 3 2
hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 3 1 1 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 13 1 1 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 40 2 2 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 41 1 1 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 62 1 1 hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 64 1 1
hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop F 49 1 1 hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop F 55 1 1
hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop R 5 1 1 hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop R 82 1 1
The ID of the reference sequence to which the read could be mappped (unambiguously or ambiguously)
The strand, either F or R
The start position within the reference sequence. Each unique sequence will have a own row in the .seq
files.
The number of reads having this start position.
When screening the .pos
file this column deserves the most attention.
The number of unique sequences having this start position
The .seq
files are very similar to the .pos
files. The only difference is, that each unique sequence occupies a own row and the
actual sequence of the read is being displayed.
Following for the same reference sequences as above an example of a .seq
file:
hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACGTATTA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACGTATT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACTTAT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATAGATATATGTACGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACTTATGA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATAGATGTACGTATG hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 5 TATATATATATATGTACGTAT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TCTATATATATATGTACGTATGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 14 TATATATATATATGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACTAAT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATATACGTATT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTAC hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACTTATG hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 3 TATATATATATATGTACGTATG hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACGGGTG hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATAGATATATATGTACGTATGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 2 TATATATATATATGTACGTATGT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATATATATGTACGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 11 TATATATATATATGTACGTATGA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 11 1 TATATATACATATTTAC hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 42 1 ATGTTTAGGTAGATAT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 46 1 ATACGTAGACATGTA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 46 1 ATACGTAGATATATATGTATTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 2 TACTTAGATATATATTTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 2 TACGGAGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGGGGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGCAGATATATATTTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 123 TACGTAGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 4 TACGTAGATATATATTTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGCTATATATTTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGATATATATGTATTTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGATATGTATGGATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 3 TACGTATATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 AACGGAGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGATATATATTCATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGCAGATATATATGGATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGATGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 GACGTAGATATATATGGATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGATATATATGTATTAT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAGATATATATGCATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTAAATATATATGTATTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TAAGAAGATATATATGTATTTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 1 TACGTATATATATATGTATTTTA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 47 5 TACGTAGATATATATGTATTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 48 1 AAGTAGATATGTATG hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop F 48 2 ACGTAGATATATATGTATTTTA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 3 1 TATATATATGTGGGAC hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 13 1 TACGTACACATATATTTA hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 40 1 ATATATATGCACGTATACATTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 40 1 TATCTACGCATATATTT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 41 1 ATCTACGCATATATT hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 62 1 ACCCACCGAAAACAC hsa-mir-1277 MI0006419 Homo sapiens miR-1277 stem-loop R 64 1 AAACCCTCCAAAAAA hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop F 49 1 AGCTCTGTGGACAGG hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop F 55 1 GTGGAAAGGGTAGGCT hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop R 5 1 ACCCATCTGAGTTCA hsa-mir-555 MI0003561 Homo sapiens miR-555 stem-loop R 82 1 ATAGATCAGAGTTCG
The ID of the reference sequence to which the read could be mappped (unambiguously or ambiguously)
The strand, either F or R
The start position within the reference sequence. Each unique sequence will have a own row in the .seq
files.
The number of reads having this start position (column 3) and this specific unique sequence (column 5)
The sequence of the solexa reads. The counts for this specific sequence are given in (column 4).
Perl 5.8 or higher
Robert Kofler
Heinz Himmelbauer
robert.kofler at crg.es