Hit2Bed.pl - Converts mapping results to a .bed file
# Minimal argument call, specifying all required parameters. Hit2Bed.pl --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --output genome_browsable.bed # Maximal argument call, specifying all possible parameters. # Note that also the file containing the ambiguous hits may be provided Hit2Bed.pl --range chr22:1000:50000 --trackname "day0" --min_length 15 --max_length 32 --max_ambiguity 2 --output genome_browsable.fa --max_mm 2 --strand "R" --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt --input Mapping_day0_1_i_Eland_against_mature_ambiguous.txt
The input files; Several files may be specified, e.g.: --input file1 --input file2
.
The input files have to be output files of the script run_Mapping
or run_Multimapper
.
Note that unambiguously and ambiguously mapped reads may be provided for this script. Mandatory parameter.
The output file. Mandatory parameter
The minimum length of reads. Shorter reads will be ignored. default=15
The maximum length of reads. Longer reads will be ignored. default=100
The maximum number of mismatches. Reads having more mismatches will not be used. default=2
Only reads mapping to the specified strand will be used. Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF
The maximum ambiguity of the hits. Hits having a higher ambiguity will be ignored. The ambiguity is an integer value which relates how often a read could be mapped with an equal good score (number of mismatches) to the reference sequence. Examples:
A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times, always having one mismatch, will have a ambiguity of "3".
A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".
A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".
default=5
If you are only interested in reads mapping to a certain chromosomal region you may use this option.
The range has to be provided in the following format chromosome:start:end
.
For example --range chr22:10000:50000
means that only reads from chromosome 22, mapping to positions between 10.000 and 50.000 bp will be used.
default=undef;
A name for the track. This information will, for example, be displayed in the UCSC-Genome browser. A name may also be easily added afterwards by editing the ".bed" file. default="unknown"
Display the help pages
Converts a hit-file into a .bed file. Bed files are accepted by most genome-browsers like the UCSC-Genome Browser. The ".bed" format allows for a score. This script uses this option by assigning a number ranging from 1 to 1000 to each read, which reflects their ambiguity. Unambiguous mapped reads will recieve a score of 1000 whereas the score will be lower for ambiguously mapped reads.
Mapping results of the script run_Mapping.pl
or run_Multimapper.pl
.
Note that unambiguous and ambiguous mapping results may be provided.
For example:
5031||Count=1 TCCCCGCCGGCGGAA chr4 1 0 F 170168086 5217||Count=1 GACCGTCCAACGCAC chr20 1 0 R 59264663 5560||Count=1 ATCGGGTGGTAGCAA chr3 1 0 F 16192245 6184||Count=12 TCCGGGCTACTGCTG chr1 1 0 F 29388851 6209||Count=1 GCAGCCATCGTTTTT chr10 1 0 F 61351707
A ".bed" file which is accepted by most genome browsers.
For example:
track name=region 1 description="unknown" useScore=1 chr22 23901283 23901298 id:715 600 - chr22 41490201 41490216 id:9720 600 - chr22 48966829 48966844 id:9720 600 + chr22 36860963 36860978 id:24947 600 + chr22 16801453 16801468 id:43281 700 + chr22 16801453 16801468 id:43281 700 +
The id of the reference sequence. In order to be accepted by a genome browser this has to be something like chr1, chr12 etc
Start position of the read
End position of the read
The name of the unique sequence. As several reads may have the same unique sequence this name must not be unique for a specific read.
As shown in Output
each unique sequence, mapped with MIRO, has an associated count property (For example: 24688||Count=3
).
Therefore the script copies each unique sequence Count
times for the ".bed" file, thus the reads displayed in the Genome-Browser will reflect the actual number of observed reads.
The score. Some genome browsers assign different colors to reads with different scores. For example high scoring reads may have a very dark color whereas low scoring reads will have a very light scolor. To use this feature this script assigns a score which reflects the mapping ambibuity of the read
Score = 1000 - (ambiguity - 1) * 100 Score = 100 if Score < 100
Strand. + forward strand; - reverse strand
Perl 5.8 or higher
Robert Kofler
Heinz Himmelbauer
robert.kofler at crg.es