NAME

DGE_BayesSign.pl - Calculates significant differences in gene (miRNA) expression using a bayesian approach


SYNOPSIS

 # Minimal argument call specifying all required parameters. At least two different samples have to be specified
 DGE_BayesSign.pl --output bayes_pairwise.txt
                   Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
                   Mapping_day6_1_i_Eland_against_mature_unambiguous.txt
 
 
 # Maximum argument call specifying all possible parameters; Many different samples may be specified
 DGE_BayesSign.pl --output bayestable.txt --significance 0.001
                   --min_length 20 --max_length 25 --max_mm 2 --strand RF
                   --normalisation off --bayes_table
                   Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
                   Mapping_day6_1_i_Eland_against_mature_unambiguous.txt
                   Mapping_day12_1_i_Eland_against_mature_unambiguous.txt
                   Mapping_day24_1_i_Eland_against_mature_unambiguous.txt


OPTIONS

--output

The output file; Mandatory parameter

--strand

Only reads mapping to the specified strand will be used for calculating the significance of differential gene (miRNA) expression. Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF

--min_length

The minimum length of reads. Shorter reads will not enter the calculation. default=15

--max_length

The maximum length of reads. Longer reads will not enter calculation. default=100

--max_mm

The maximum number of mismatches. Reads having more mismatches will not enter calculation. default=2

--significance

The significance threshold. Differences in gene (miRNA) expression below this threshold will not be displayed. This parameter is only option for the pairwise significant differences (see --bayes_table). default=0.001

--bayes_table

Flag; Indicating whether pairwise significant differences between the samples or alternatively a significance table should be created. The default is the pairwise significant differences. Upon providing this flag the signficance table will be created instead.

--normalisation

MIRO allows to use several normalisation methods. default=scalelinear; The normalisation method is not affecting the significance of differential expression. Furthermore the normalisation method is only considered for the "pairwise significant differences". At the moment the following normalisation methods are supported:

off

samples are not normalised, the actual observed read counts will be displayed

scalelinear

The total expression levels will be linearly scaled to constant level. The individual read counts will be adjusted accordingly. This is the most straight-forward normalisation method

quantile

The quantil-normalisation method

housekeep..

Examples: housekeep5, housekeep10, housekeep20;

This normalisation methods is a derivate of the scalelinear method. Instead of using all genes (miRNAs) for calculating the normalisation factor, only the genes having a medium expression levels will be used. Therefore the genes having the highest and the lowest expression levels will be ignored (of course only for calculating the normalisation factor, not for normalisation itself). The housekeep normalisation has to be called with the exact percentage of genes to be skipped. E.g.: housekeep20 ignores the 20% highest and the 20% lowest expressed genes. The genes (miRNAs) are weighted by the log2 of the expression level.

--help

Display the help pages.


DESCRIPTION

General

This script calculates significant differences in gene (miRNA) expression between two or more samples. The differences are calculated from run_Mapping.pl output files. Either pairwise significant differences are calulated or a significance table. The method used for calculating the bayesian significance was published by Audic and Claverie (1997): "The Significance of Digital Gene Expression Profiles"

Input

A list of unambiguously mapped read files. At least two different files have to be provided, whereas each file is assumed to represent one sample e.g.: one tissue or one time point. The file have to be unambiguously mapped reads using the script run_Mapping or run_Multimapper:

For example:


 24688||Count=3         TACCCTGTAGATCCGAATTTGT          hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       0       F       1
 128318||Count=2        TACCCTGTAGATCCGAATTTGTG         hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       0       F       1
 150952||Count=1        TACCCTGTAGATCCTAATTTGTGT        hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       2       R       1
 212857||Count=1        TACCCTGTAGATCCAAATTTGT          hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       1       F       1
 317801||Count=1        TACCTTGTAGATCCGAATTTGTG         hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       1       F       1
 389805||Count=1        TACCCTGTATATCCGAATTTGTGG        hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       2       F       1

Output

The output will either be a "pairwise significant differences" in gene (miRNA) expression or a significance table.

"pairwise significant differences"

This file will be created per default if the --bayes_table flag is not set.


 mmu-miR-709 MIMAT0003499 Mus musculus miR-709  5.83612643834978e-08    day0    day12   2.0     58.0
 mmu-miR-425 MIMAT0004750 Mus musculus miR-425  6.48859011835964e-08    day24   day36   62.0    156.0
 mmu-miR-375 MIMAT0000739 Mus musculus miR-375  7.50493843738679e-08    day0    day3    6.0     40.0
 mmu-miR-484 MIMAT0003127 Mus musculus miR-484  7.68847709460979e-08    day3    day24   67.0    1.0
  1. The first column represents the reference sequence for which significant differences in gene (miRNA) expression have been observed

  2. The second column represents the bayesian significance of the observed differences in expression levels. The whole file is sorted by significance.

  3. The third column represents the ID of the first sample

  4. The fourth column represents the ID of the second sample

  5. The fifth column represents the read counts for the first sample. (normalised according to the specified method; default=scalelinear)

  6. The sixth column represents the read counts of the second sample. (normalised according to the specified method; default=scalelinear)

It is important to note the the significance is not calculated from the data provided in columns 5 and 6. Instead the actually observed read counts for each gene (miRNA) and the total read counts (summary for all genes of the same sample) enter the equation. For further details we refer to the paper of Audic and Claverie.

"significance table"

This file will only be created if the --bayes_table flag is set.

 
                                                         day0   day3        day6        day12       day24      day36
  mmu-miR-99b MIMAT0000132 Mus musculus miR-99b         -       1.2e-305    7.4e-231    3.1e-123    0          3.8e-250
  mmu-miR-139-3p MIMAT0004662 Mus musculus miR-139-3p   -       0           6.0e-311    0           0          2.0e-154
  mmu-miR-378 MIMAT0003151 Mus musculus miR-378         -       0           0           3.6e-305    0          0

This is a table which contains the pairwise significance between gene (miRNA) expression levels. Each significance value refers to differences to the first column which should usually be a background level or the first time point. This is another example where the feature --idorder is important. Using a concrete example the value 7.4e-231 in the fourth column and second row means that the differences in expression level between day6 and day0 is highly significant: 7.4e-231. The table is sorted descencding by overal significance, showing the most significantly expressed miRNAs first.


REQUIREMENTS

Perl 5.8 or higher


AUTHORS

Robert Kofler

Heinz Himmelbauer


CONTACT

robert.kofler at crg.es