RankProdIt: Interactive Rank Products Analysis

  restart analysis

help  

help

Citation

If you use the data from this tool please cite:

Laing E, Smith CP. RankProdIt : A web interactive Rank products analysis tool. BMC Research Notes 2010, 3:221.

Rank Products and Rank Sum

Rank Products and Rank Sum [1, 2] are methods with which to identify the differential expression of objects (typically genes, but could be probes) between two conditions (e.g. "normal environment" versus "environmental perturbation" or Wild-type versus mutant). The technique has many advantages over the traditional t-test [1] and has been shown to be robust with noisy data [3]. Rank Products (and Rank Sum) analysis does not rely on calculating measurement variance between replicates and thus can be performed with as few as two biological replicates for each condition [1]. However, it is advised, for the reliability of your results, to have as many biological replicates as possible.

This web-tool is provided by calling to the R [4] package RankProd [2].

It is recommended that before using this tool that you are familiar with the process of Rank Products analysis by reading the original paper of Breitling et al. here. For those interested in Rank Sum analysis please refer to here.

Input data file requirements

An example input file can be seen here.

Columns

The input data file expected by this tool is a tab delimited text file (.txt file) containing at least one 'gene identifier' column that can be used to identify the rows of the file (e.g. gene names and/or probe names), and multiple columns of numerical data (typically gene expression data) obtained from multiple biological replicates of the two conditions to be compared. The numerical columns can either contain absolute levels (e.g. a measurement for wild-type) OR ratios (e.g. measurement for wild-type / measurement for mutant) in linear or log scale. If the columns within the input file contain absolute level data then there should be at least four numerical data columns - 2 columns for condition 1 (e.g. WT) and 2 columns for condition 2 (e.g. mutant). If the columns within the input file are ratios (condition1/condition2 or condition2/condition1) then there should be at least two numerical data columns. Note: Whether ratios are of condition1/condition2 or condition2/condition1 format is entirely the choice of the user and only effects visualisation of the results, not the calculation. If submitting ratios you should not mix condition1/condition2 and condition2/condition1 formats (i.e. all ratios should be transformed such that the denominator condition is common to all columns).

Note: Different genes (or probes) should be on different rows.

Please note that it is not an absolute requirement to have condition replicates next to each other in the input file, the columns used in the analysis are those that are selected by the user upon successful file upload. Similarly, columns in the input file can be selected to be ignored in the analysis.

Column headers

The use of column headers is optional but if used there must only be one column header; if not an error will be produced.

It is recommeneded that column headers are descriptive to aid automation. i.e. if all data obtained from the wild-type has a column header of WT and all data obtained from the mutant has a column header of Mt, this tool will automatically assign the WT to Condition 2 and Mt to Condition 1 (Note that header formats are entirely up to the user).

Data values

Data can be in log or linear scale. RankProdIt will automatically recognise the format of your data for you.

Missing data can be represented by NA, na, NaN or nan. Note: Prior to analysis, it is recommended that genes without values for at least two biological replicates for the conditions being tested be filtered out, as no statistics for those genes can be calculated.

Conducting Analysis

To aid the user, following successful submission of an input file, RankProdIt will attempt to automate the column parameters (selections) for you: it will aim to identify the column most likely to contain row (gene) identifiers, whether the numerical data is absolute level or ratios, which condition a column represents (possible with suitable column header names, as discussed above), whether a particular column should be ignored (columns containing text other than a gene identifier column and/or columns containing text and numerical data), the scale of the data and whether the file contains a header row. All automated selections can be changed by the user.

Once the user is happy with the parameters to be passed they can choose whether to perform RankProducts (by default) or RankSum analysis. Please see [1,5] for methods. Both analytical methods will identify genes whose response between conditions is preserved and likely to be significant.

Upon submission the selected parameters and input file are passed to the R RankProd package for calculations. For those familiar with RankProducts and/or Rank Sum analysis, 100 permutations are run to assess the significance of a gene ranking, to provide the pfp (probability of false predition) value.

Data output and interpretation

The output of this tool is a tab delimited text file that can be downloaded and opened with a text editor or spreadsheet application of choice.

The output file contains the gene identifier and data selected for analysis along with the associated average rank, p.value, pfp value and average fold change for every submitted gene (or probe) as described for the RankProd tool [ 2]. A description of columns in the RankProdIt output file can be found here. To interpret the data, to identify statistically significant genes up- and down-regulated in Condition 2 with respect to Condition 1 (for example), it is suggested to sort (e.g. by the sort function of Microsoft Excel) the file by the columns "Condition 1 < Condition 2 pfp value" and "Condition 1 > Condition 2 pfp value" respectively. A pfp value threshold (typically < 0.15) can then be applied to identify statistically significant differentially expressed genes.

Contact

Please contact e.laing@surrey.ac.uk with any queries or problems.

close help