NAME

DGE_Table.pl - Creates a table containing the gene (miRNA) expression levels of different samples


SYNOPSIS

 # Minimal argument call specifying all required parameters. At least one sample has to be specified
 DGE_Table.pl --output table.txt
              Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
 
 
 # Maximum argument call specifying all possible parameters; Many different samples may be specified
 DGE_Table.pl --output table.txt --min_length 20 --max_length 25 --max_mm 2 --strand RF
              --normalisation quantile --format diflog2
                 Mapping_day0_1_i_Eland_against_mature_unambiguous.txt
                 Mapping_day6_1_i_Eland_against_mature_unambiguous.txt
                 Mapping_day12_1_i_Eland_against_mature_unambiguous.txt
                 Mapping_day24_1_i_Eland_against_mature_unambiguous.txt


OPTIONS

--output

The output file; Mandatory parameter

--strand

Only reads mapping to the specified strand will be used for creating the expression tables. Possible values: R (reverse strand), F (forward strand), RF (both strands); default=RF

--min_length

The minimum length of reads. Shorter reads will not enter the expression tables. default=15

--max_length

The maximum length of reads. Longer reads will not enter the expression tables. default=100

--max_mm

The maximum number of mismatches. Reads having more mismatches will not enter the expression tables. default=2

--format

The formatting of the expression table. default=count; At the moment the following formats are supported:

count

The expression level will be displayed in counts. If the normalisation off is used, this values represent the actually observed number of reads mapping to a certain gene (miRNA).

log2

The expression level will be displayed as log2 of the counts (count).

diflog2

For each sample the difference to the first sample will be displayed using the following equation:

 x = log2( count / count_sample1)

This may be useful to identify differences to a sample which represents a background level of expression or the first timepoint in a time-course experiment (t0);

--normalisation

MIRO allows to use several normalisation methods to create the expression tables. default=scalelinear; At the moment the following normalisation methods are supported:

off

samples are not normalised, the actual observed read counts will be displayed

scalelinear

The total expression levels will be linearly scaled to constant level. The individual read counts will be adjusted accordingly. This is the most straight-forward normalisation method

quantile

The quantile-normalisation method

housekeep..

Examples: housekeep5, housekeep10, housekeep20;

This normalisation methods is a derivate of the scalelinear method. Instead of using all genes (miRNAs) for calculating the normalisation factor, only the genes having a medium expression levels will be used. Therefore the genes having the highest and the lowest expression levels will be ignored (of course only for calculating the normalisation factor, not for normalisation itself). The housekeep normalisation has to be called with the exact percentage of genes to be skipped. E.g.: housekeep20 ignores the 20% highest and the 20% lowest expressed genes. The genes (miRNAs) are weighted by the log2 of the expression level.

--help

Display the help pages.


DESCRIPTION

General

This script creates a table (matrix) containing the gene (miRNA) expression levels of samples. Several different normalisation methods may be used and several formatting methods are available. The expression tables are computed from run_Mapping.pl output files.

Input

One or more unambiguously mapped read files. Each file is assumed to represent one sample e.g.: one tissue or one time point. The file have to be unambiguously mapped reads using the script run_Mapping or run_Multimapper:

For example:


 24688||Count=3         TACCCTGTAGATCCGAATTTGT          hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       0       F       1
 128318||Count=2        TACCCTGTAGATCCGAATTTGTG         hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       0       F       1
 150952||Count=1        TACCCTGTAGATCCTAATTTGTGT        hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       2       R       1
 212857||Count=1        TACCCTGTAGATCCAAATTTGT          hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       1       F       1
 317801||Count=1        TACCTTGTAGATCCGAATTTGTG         hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       1       F       1
 389805||Count=1        TACCCTGTATATCCGAATTTGTGG        hsa-miR-10a MIMAT0000253 Homo sapiens miR-10a   1       2       F       1

Output

The output is a table containing the expression levels for each gene (miRNA) and sample. The genes (miRNAs) are given as rows and the samples as columns. The actual values vary with the normalisation method and the formatting. When choosing normalisation --normalisation off and format --format count the values in the table will represent the actual observed number of reads mapping to a certain gene (miRNA).

count

The expression level will be displayed in counts. For example:

 
                                                        day0    day3    day6    day12   day18   day24   
 hsa-mir-191 MI0000465 Homo sapiens miR-191 stem-loop   16666.9 6826.7  26944.6 8692.7  8851.5  16534.4
 hsa-mir-29a MI0000087 Homo sapiens miR-29a stem-loop   1359.4  2704.2  24531.5 4004.6  3931.7  44904.4
 hsa-mir-142 MI0000458 Homo sapiens miR-142 stem-loop   6518.8  3720.0  14015.7 3319.0  3428.3  23975.9
 hsa-mir-10a MI0000266 Homo sapiens miR-10a stem-loop   2177.2  2980.4  18518.0 2138.8  1880.9  15114.3
 hsa-mir-146b MI0003129 Homo sapiens miR-146b stem-loop 4.4     6.2     3607.6  459.6   8479.3  20975.4
log2

The expression level will be displayed as log2 of the counts (count). For example:

                                                        day0    day3    day6    day12   day18   day24
 hsa-mir-191 MI0000465 Homo sapiens miR-191 stem-loop   14.0    12.7    14.7    13.1    13.1    14.0
 hsa-mir-29a MI0000087 Homo sapiens miR-29a stem-loop   10.4    11.4    14.6    12.0    11.9    15.5
 hsa-mir-142 MI0000458 Homo sapiens miR-142 stem-loop   12.7    11.9    13.8    11.7    11.7    14.5
 hsa-mir-10a MI0000266 Homo sapiens miR-10a stem-loop   11.1    11.5    14.2    11.1    10.9    13.9
 hsa-mir-146b MI0003129 Homo sapiens miR-146b stem-loop 2.1     2.6     11.8    8.8     13.0    14.4
diflog2

For each sample the difference to the first sample will be displayed using the following equation


 x = log2( count / count_sample1)

This may be useful to identify differences to a sample which represents a background level of expression or the first timepoint in a time-course experiment (t0); For example:

                                                        day0    day3    day6    day12   day18   day24
 hsa-mir-191 MI0000465 Homo sapiens miR-191 stem-loop   0.0     -1.3    0.7     -0.9    -0.9    -0.0
 hsa-mir-29a MI0000087 Homo sapiens miR-29a stem-loop   0.0     1.0     4.2     1.6     1.5     5.0
 hsa-mir-142 MI0000458 Homo sapiens miR-142 stem-loop   0.0     -0.8    1.1     -1.0    -0.9    1.9
 hsa-mir-10a MI0000266 Homo sapiens miR-10a stem-loop   0.0     0.5     3.1     -0.0    -0.2    2.8
 hsa-mir-146b MI0003129 Homo sapiens miR-146b stem-loop 0.0     0.5     9.7     6.7     10.9    12.2


REQUIREMENTS

Perl 5.8 or higher

R 2.7.0 or higher


AUTHORS

Robert Kofler

Manuela Hummel

Lauro Sumoy

Heinz Himmelbauer


CONTACT

robert.kofler at crg.es