NAME

Hit2Wiggle.pl - Clusters reads using a sliding window approach into the wiggle format


SYNOPSIS

 # Minimal argument call specifying all required parameters.
 Hit2Wiggle.pl --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
               --output day0.wig
 # Maximum argument call specifying all possible parameters; Several different input files may be specified
 # Note that also the file containing the ambiguous hits may be specified
 Hit2Wiggle.pl --output day0.wig --min_length 15 --max_length 32 --max_mm 2 --strand RF
               --min_count 10 --max_ambiguity 2 --tempdir "/tmp"
               --window_size 1000
               --input Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
               --input Mapping_day0_1_i_Eland_against_mature_ambiguous.txt


OPTIONS

--input

The input files; Several files may be specified, e.g.: --input file1 --input file2. The input files have to be output files of the script run_Mapping or run_Multimapper. Note that unambiguously and ambiguously mapped reads may be provided for this script. Mandatory parameter.

--output

The output file. Mandatory parameter

--strand

Only reads mapping to the specified strand will be used. You may also specify both strands in which case reads from different strands will be clustered in the same window!! Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF

--min_length

The minimum length of reads. Shorter reads will be ignored. default=15

--max_length

The maximum length of reads. Longer reads will be ignored. default=100

--max_mm

The maximum number of mismatches. Reads having more mismatches will not be used. default=2

--max_ambiguity

The maximum ambiguity of the hits. Hits having a higher ambiguity will be ignored. The ambiguity is an integer value which relates how often a read could be mapped with an equal good score (number of mismatches) to the reference sequence. Examples:

A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".

A read which could be mapped to the H. sapiens genome three times, always having one mismatch, will have a ambiguity of "3".

A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".

A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".

default=5

--min_count

Minimum counts for a wiggle window. Windows having less reads will not be reported. default=1

--window_size

The size of the sliding window; default=1000

--trackname

The name of the track. This information may for example be displayed in the genome browser. default=unknown

--tempdir

The temporary directory used; default=/tmp

--help

Display the help pages.


DESCRIPTION

General

The script clusters hits using a sliding window approach into the wiggle format. The wiggle format is supported by most genome browsers.

Input

Mapping results of the script run_Mapping.pl or run_Multimapper.pl. Note that unambiguous and ambiguous mapping results may be provided.

For example:


 5031||Count=1   TCCCCGCCGGCGGAA chr4    1       0       F       170168086
 5217||Count=1   GACCGTCCAACGCAC chr20   1       0       R       59264663
 5560||Count=1   ATCGGGTGGTAGCAA chr3    1       0       F       16192245
 6184||Count=1   TCCGGGCTACTGCTG chr1    1       0       F       29388851
 6209||Count=1   GCAGCCATCGTTTTT chr10   1       0       F       61351707

Ambiguity

Ambiguity is an important concept in the MIRO-pipeline, it is therefore crucial that this concept is properly understood. In a nutshell, ambigutiy is the number of equal good mapping positions for a single Solexa-read. Equal good in this context refers to the number of mismatches. In the MIRO-pipeline all unambiguously mapped reads have a ambiguity of "1" and they are provided in a separate output-file. All ambiguously mapped reads, on the other hand, have a ambiguity of ">=2"

Examples:

A read which could be mapped to the H. sapiens genome only once having two mismatches, will have a ambiguity of "1".

A read which could be mapped to the H. sapiens genome three times, always having only one mismatch, will have a ambiguity of "3".

A read which could be mapped to the H. sapiens genome three times having one mismatch and one time having zero mismatches, will have a ambiguity of "1".

A read which could be mapped to the H. sapiens genome three times having one mismatch and two times having zero mismatches, will have a ambiguity of "2".

Output

A wiggle formated file which is accepted by most genome browsers. The track informations may easily edited manually directly in the output file.

For example:

 track type=wiggle_0 name="unknown" description="variableStep format" visibility=full
 variableStep chrom=chr7 span=1000
 156001         30 
 763001         5
 897001         5
 1029001        154
 1085001        10
 1537001        5
 1850001        10
 1879001        7
 2069001        15
 2116001        5
 2361001        13
 2384001        5
 4817001        11
 variableStep chrom=chr20 span=1000
 255001         13
 336001         7
 762001         6
 930001         11
 1113001        7
 1321001        5
 1810001        9
 2391001        12
 2581001        96
 2582001        6
 variableStep chrom=chr14 span=1000
 18413001       9
 18755001       10
 19854001       5
 19861001       5
 19881001       274
 19895001       6
 19952001       8
 20147001       5
 20151001       186
 20163001       74


REQUIREMENTS

Perl 5.8 or higher


AUTHORS

Robert Kofler

Heinz Himmelbauer


CONTACT

robert.kofler at crg.es