NAME

TrimSequencesBy.pl - Trims all nucleotide sequences at the 3'-end by a given factor


SYNOPSIS

 # Minimal argument call, specifying all required parameters.
 TrimSequencesBy.pl --input fastafile.fa --output trimmedsequences.fa
 
 # Maximal argument call, specifying all possible parameters.
 TrimSequencesBy.pl --input fastafile.fa --output trimmedsequences.fa
                    --trimby 2 --lower_boundary 15


OPTIONS

--input

The input file. Has to be a multiple fasta file. Mandatory parameter

--output

The output file. Mandatory parameter

--trimby

The number of nucleotides which should be removed at the 3'-end of each nucleotide sequence. default=2

--lower_boundary

The lower boundary for trimming the nucleotide sequences in base pairs. Sequences will never be trimmed beyond this boundary. default=15

--help

Display the help pages


DESCRIPTION

General

The script trims each nucleotide sequence of a multiple fasta file at the 3'-end by a given factor. Solexa reads usually accumulate mismatches at their 3'-ends which may impede mapping of the reads. The script may thus be useful, for example, to trim all no-matches of a mapping step by two base pairs and repeat the mapping with this trimmed no-matches. This script is especially useful when the nucleotide sequences have variing lengths.

Input

Multiple fasta files. For example:

 >43||Count=1
 GAAATTTAAGAAACAATTATAATCCAC
 >44||Count=1
 ATTCGCGTTCAGCTGAGGCAGAGTGATGGT
 >45||Count=2
 TCCCTGTGGTCTATTGTTTATGATTCGGCT
 >46||Count=1
 TCCCGGGGCGTCTAGTGGTTAGGGTTTGGCG
 >47||Count=3
 TTCCTGTTGTCTAGTGGTTAGG

Output

A multiple fasta file, containing the trimmed sequences. For example, the sequences shown above trimmed by two base pairs:

 >43||Count=1
 GAAATTTAAGAAACAATTATAATCC
 >44||Count=1
 ATTCGCGTTCAGCTGAGGCAGAGTGATG
 >45||Count=2
 TCCCTGTGGTCTATTGTTTATGATTCGG
 >46||Count=1
 TCCCGGGGCGTCTAGTGGTTAGGGTTTGG
 >47||Count=3
 TTCCTGTTGTCTAGTGGTTA


REQUIREMENTS

Perl 5.8 or higher


AUTHORS

Robert Kofler

Heinz Himmelbauer


CONTACT

robert.kofler at crg.es