R/estimate_MiniMax_params.R
MiniMax_estBetaParams.RdGiven a vector of MiniMax statisic values under the null hypothesis, estimate the parameters of the Beta Distribution which best fits these values.
MiniMax_estBetaParams( MiniMaxNull_num, nPlatforms, orderStat = 2L, method = c("parametric", "MLE", "MoM") )
| MiniMaxNull_num | A numeric vector of MiniMax statistics under the null |
|---|---|
| nPlatforms | An integer stating how many data platforms are in the original data. |
| orderStat | How many platforms should show a biological signal for a pathway / gene set to have multi-omic "enrichment"? Defaults to 2. See "Details" for more information. |
| method | Which estimation method will be used to find the parameters of
the Beta Distribution? Options are |
A list of 3 components: "alpha" and "beta" hold the parameter estimates of the Beta Distribution, and "method" returns a character string denoting which estimation method was used.
Concerning Parameter Estimation Methods: We currently support 3
options to estimate the parameters of the Beta Distribution. The
"parametric" option does not use the data. Instead, it assumes that the
MiniMax statistics will have a Beta \((k, n + 1 - k)\) distribution,
where \(k\) is the value of orderStat and \(n\) has the value
nPlatforms. See https://en.wikipedia.org/wiki/Order_statistic.
The next two estimation options make use of the MiniMaxNull_num
vector, which should be calculated by finding the same significance levels
of the statistical tests used on the real data (for each pathway and data
platform), but by using a random permutation of the outcome of interest
instead of the real values; more permutations are better. The "MLE" option
uses the beta.mle function to find the Maximum
Likelihood Estimates of \(\alpha\) and \(\beta\). The "MoM" option uses
the closed-form Method of Moments estimators of \(\alpha\) and
\(\beta\) as shown in
https://en.wikipedia.org/wiki/Beta_distribution#Method_of_moments.
Concerning Appropriate Order Statistics: The MiniMax operation is
equivalent to sorting the p-values and taking the second smallest.
In our experience, setting this "order statistic" cutoff to 2 is
appropriate for =< 5 data platforms. Biologically, this is equivalent to
saying "if this pathway is dysregulated in at least two data types for
this disease / condition, it is worthy of additional consideration". In
situations where more than 5 data platforms are available for the disease
of interest, we recommend increasing the orderStat value to 3.
miniMax_num <- nullMiniMaxResults_df$MiniMax MiniMax_estBetaParams(miniMax_num, nPlatforms = 3L) #> $alpha #> [1] 2 #> #> $beta #> [1] 2 #> #> $method #> [1] "Parametric" #> #> attr(,"class") #> [1] "MiniMaxParams" "list" MiniMax_estBetaParams(miniMax_num, nPlatforms = 3L, method = "MoM") #> $alpha #> [1] 1.908401 #> #> $beta #> [1] 2.056421 #> #> $method #> [1] "Method of Moments" #> #> attr(,"class") #> [1] "MiniMaxParams" "list" MiniMax_estBetaParams(miniMax_num, nPlatforms = 3L, method = "MLE") #> $alpha #> [1] 1.787027 #> #> $beta #> [1] 2.03405 #> #> $method #> [1] "Maximum Likelihood" #> #> attr(,"class") #> [1] "MiniMaxParams" "list"