Given a data frame of pathway-level p-values across multiple -omics platforms, use the MiniMax technique to assign statistical significance to concordant or cascading pathway-level biological effects.
MiniMax( pValues_df, pValuesNull_df = NULL, orderStat = 2L, method = c("parametric", "MLE", "MoM"), annotateResults = TRUE, ... )
| pValues_df | A data frame of pathway / gene set p-values under true responses (this data set should contain true biological signal). The rows correspond to gene sets / pathways, and the columns correspond to the data platforms for the disease of interest. |
|---|---|
| pValuesNull_df | A data frame of pathway / gene set p-values under
the null hypothesis, most likely constructed from randomly permuting the
response and re-estimating all significance levels (this data set should
NOT contain any true biological signal). As with |
| orderStat | How many platforms should show a biological signal for a pathway / gene set to have multi-omic "enrichment"? Defaults to 2. See "Details" for more information. |
| method | If |
| annotateResults | Should the platforms driving each result be marked?
Defaults to |
| ... | Additional arguments passed to the |
A copy of the pValues_df data frame with two additional
columns: MiniMax (the statistic values for each gene set) and
MiniMaxP (the p-values of these statistics). This data frame
is sorted by ascending MiniMax p-value.
Concerning Parameter Estimation Methods: We currently support 3
options to estimate the parameters of the Beta Distribution. The
"parametric" option does not use the data, and it is therefore the only
option available if pValuesNull_df is not provided. Instead, it
assumes that the MiniMax statistics will have a Beta \((k, n + 1 - k)\)
distribution, where \(k\) is the value of orderStat and \(n\)
has the value nPlatforms.
See https://en.wikipedia.org/wiki/Order_statistic.
The next two estimation options make use of the pValuesNull_df data
frame, which should be calculated by finding the same significance levels
of the statistical tests used on the real data (for each pathway and data
platform), but by using a random permutation of the outcome of interest
instead of the real values; more permutations are better. The "MLE" option
uses the beta.mle function to find the Maximum
Likelihood Estimates of \(\alpha\) and \(\beta\). The "MoM" option uses
the closed-form Method of Moments estimators of \(\alpha\) and
\(\beta\) as shown in
https://en.wikipedia.org/wiki/Beta_distribution#Method_of_moments.
Concerning Appropriate Order Statistics: The MiniMax operation is
equivalent to sorting the p-values and taking the second smallest.
In our experience, setting this "order statistic" cutoff to 2 is
appropriate for =< 5 data platforms. Biologically, this is equivalent to
saying "if this pathway is dysregulated in at least two data types for
this disease / condition, it is worthy of additional consideration". In
situations where more than 5 data platforms are available for the disease
of interest, we recommend increasing the orderStat value to 3.
data("multiOmicsMedSignalResults_df") data("nullMiniMaxResults_df") MiniMax( pValues_df = multiOmicsMedSignalResults_df, pValuesNull_df = nullMiniMaxResults_df[, -5], method = "MLE", # Passed to the MiniMax_calculateDrivers() function drivers_char = c("cnv", "rnaSeq", "protein") ) #> # A tibble: 50 x 8 #> terms treated pVal_CNV pVal_RNAseq pVal_Prot MiniMax MiniMaxP drivers #> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 cluster05 TRUE 0.213 0.007 0.012 0.012 0.00105 protein an~ #> 2 cluster24 FALSE 0.03 0.016 0.669 0.03 0.00532 cnv and rn~ #> 3 cluster32 FALSE 0.03 0.037 0.661 0.037 0.00771 cnv and rn~ #> 4 cluster19 TRUE 0.18 0.004 0.044 0.044 0.0105 protein an~ #> 5 cluster40 FALSE 0.103 0.102 0.515 0.103 0.0459 cnv and rn~ #> 6 cluster26 FALSE 0.115 0.019 0.754 0.115 0.0554 cnv and rn~ #> 7 cluster49 FALSE 0.595 0.127 0.158 0.158 0.0947 protein an~ #> 8 cluster35 FALSE 0.034 0.201 0.688 0.201 0.141 cnv and rn~ #> 9 cluster45 FALSE 0.094 0.23 0.249 0.23 0.175 cnv and rn~ #> 10 cluster11 FALSE 0.247 0.159 0.473 0.247 0.197 cnv and rn~ #> # ... with 40 more rows