Mark which Platforms are Driving the MiniMax Statistics — MiniMax

Given a data frame of pathway-level p-values, mark which of the multi-omics platforms are driving the MiniMax statistic.

MiniMax_calculateDrivers(
  res_df,
  orderStat = 2L,
  drivers_char = colnames(res_df),
  sortLabels = TRUE,
  separator = " and "
)

Arguments

res_df	A data frame of p-values. The rows correspond to gene sets / pathways and the columns correspond to a data platform for the disease of interest.
orderStat	How many platforms should show a biological signal for a pathway / gene set to have multi-omic "enrichment"? Defaults to 2. See "Details" for more information.
drivers_char	What labels should be given to the driving platforms? Defaults to the column names of `res_df`. If you supply custom labels, make sure to match them to the column order of `res_df`.
sortLabels	Should the driver labels be sorted alphabetically before concatenation? Defaults to `TRUE`; that is, a multi-omics result driven first by protein expression then by DNA methylation will have the same label as a result driven first by DNA methylation then by protein expression. If you would like the magnitude of the p-value to set the label order, then use `sortLabels = FALSE`.
separator	What character string should be used to separate the names of the driving platforms? Defaults to `" and "`; for example, if the platform driver labels are `"protein"` and `"cnv"`, and if `sortLabels = TRUE`, then the label of drivers would be `"cnv and protein"`.

Value

A vector of the names of the platforms driving the MiniMax statistic values.

Details

The MiniMax statistic is defined as the minimum of all pairwise maxima of pathway p-values. This operation is arithmetically equivalent to sorting the p-values and taking the second smallest. In our experience, setting this "order statistic" cutoff to 2 is appropriate for =< 5 data platforms. Biologically, this is equivalent to saying "if this pathway is dysregulated in at least two data types for this disease / condition, it is worthy of additional consideration". In situations where more than 5 data platforms are available for the disease of interest, we recommend increasing the orderStat value to 3.

NOTE: this result does not depend on the pathway significance level at all. This result will simply show you which platforms had the smallest p-values for a particular pathway, even if the MiniMax statistic is not statistically significant for that pathway. Therefore, we recommend that this function be used only for interpretation of results post-hoc.

Examples

  MiniMax_calculateDrivers(
    multiOmicsHighSignalResults_df[, -(1:2)],
    drivers_char = c("cnv", "rnaSeq", "protein")
 )
#>  [1] "cnv and protein"    "protein and rnaSeq" "protein and rnaSeq"
#>  [4] "cnv and rnaSeq"     "cnv and rnaSeq"     "cnv and protein"   
#>  [7] "protein and rnaSeq" "protein and rnaSeq" "cnv and rnaSeq"    
#> [10] "cnv and rnaSeq"     "cnv and rnaSeq"     "protein and rnaSeq"
#> [13] "cnv and rnaSeq"     "cnv and rnaSeq"     "cnv and protein"   
#> [16] "cnv and rnaSeq"     "protein and rnaSeq" "protein and rnaSeq"
#> [19] "cnv and rnaSeq"     "cnv and protein"    "protein and rnaSeq"
#> [22] "cnv and rnaSeq"     "cnv and rnaSeq"     "cnv and rnaSeq"    
#> [25] "cnv and protein"    "cnv and rnaSeq"     "cnv and protein"   
#> [28] "cnv and protein"    "cnv and protein"    "cnv and protein"   
#> [31] "cnv and rnaSeq"     "cnv and rnaSeq"     "protein and rnaSeq"
#> [34] "cnv and protein"    "cnv and rnaSeq"     "protein and rnaSeq"
#> [37] "cnv and rnaSeq"     "protein and rnaSeq" "cnv and protein"   
#> [40] "cnv and rnaSeq"     "cnv and protein"    "cnv and protein"   
#> [43] "protein and rnaSeq" "cnv and protein"    "cnv and rnaSeq"    
#> [46] "cnv and rnaSeq"     "cnv and protein"    "cnv and protein"   
#> [49] "protein and rnaSeq" "cnv and rnaSeq"