TrimSequencesBy.pl - Trims all nucleotide sequences at the 3'-end by a given factor
# Minimal argument call, specifying all required parameters. TrimSequencesBy.pl --input fastafile.fa --output trimmedsequences.fa # Maximal argument call, specifying all possible parameters. TrimSequencesBy.pl --input fastafile.fa --output trimmedsequences.fa --trimby 2 --lower_boundary 15
The input file. Has to be a multiple fasta file. Mandatory parameter
The output file. Mandatory parameter
The number of nucleotides which should be removed at the 3'-end of each nucleotide sequence. default=2
The lower boundary for trimming the nucleotide sequences in base pairs. Sequences will never be trimmed beyond this boundary. default=15
Display the help pages
The script trims each nucleotide sequence of a multiple fasta file at the 3'-end by a given factor. Solexa reads usually accumulate mismatches at their 3'-ends which may impede mapping of the reads. The script may thus be useful, for example, to trim all no-matches of a mapping step by two base pairs and repeat the mapping with this trimmed no-matches. This script is especially useful when the nucleotide sequences have variing lengths.
Multiple fasta files. For example:
>43||Count=1 GAAATTTAAGAAACAATTATAATCCAC >44||Count=1 ATTCGCGTTCAGCTGAGGCAGAGTGATGGT >45||Count=2 TCCCTGTGGTCTATTGTTTATGATTCGGCT >46||Count=1 TCCCGGGGCGTCTAGTGGTTAGGGTTTGGCG >47||Count=3 TTCCTGTTGTCTAGTGGTTAGG
A multiple fasta file, containing the trimmed sequences. For example, the sequences shown above trimmed by two base pairs:
>43||Count=1 GAAATTTAAGAAACAATTATAATCC >44||Count=1 ATTCGCGTTCAGCTGAGGCAGAGTGATG >45||Count=2 TCCCTGTGGTCTATTGTTTATGATTCGG >46||Count=1 TCCCGGGGCGTCTAGTGGTTAGGGTTTGG >47||Count=3 TTCCTGTTGTCTAGTGGTTA
Perl 5.8 or higher
Robert Kofler
Heinz Himmelbauer
robert.kofler at crg.es