Remove covariate effects from methylayion values by fitting probe-specific linear models

GetResiduals(
  dnam,
  betaToM = TRUE,
  epsilon = 1e-08,
  pheno_df,
  covariates_char,
  nCores_int = 1L,
  ...
)

Arguments

dnam

data frame or matrix of methylation values with row names = CpG IDs and column names = sample IDs. This is often the genome-wide array data.

betaToM

indicates if methylation beta values (ranging from [0, 1]) should be converted to M values (ranging from (-Inf, Inf)). Note that if beta values are the input to dnam, then betaToM should be set to TRUE, otherwise FALSE.

epsilon

When transforming beta values to M values, what should be done to values exactly equal to 0 or 1? The M value transformation would yield -Inf or Inf which causes issues in the statistical model. We thus replace all values exactly equal to 0 with 0 + epsilon, and we replace all values exactly equal to 1 with 1 - epsilon. Defaults to epsilon = 1e-08.

pheno_df

a data frame with phenotype and covariates, with variable Sample indicating sample IDs.

covariates_char

character vector for names of the covariate variables

nCores_int

Number of computing cores to be used when executing code in parallel. Defaults to 1 (serial computing).

...

Dots for additional arguments passed to the cluster constructor. See CreateParallelWorkers for more information.

Value

output a matrix of residual values in the same dimension as dnam

Details

This function fits an ordinary linear model predicting methylation values for each probe from the specified covariates. This process will be useful in scenarios where methylation values in a region or at an individual probe are known a priori to have differential methylation independent of the disease or condition of interest.

Examples

   data(betasChr22_df)

   data(pheno_df)

   GetResiduals(
     dnam = betasChr22_df[1:10, 1:10],
     betaToM = TRUE,
     pheno_df = pheno_df,
     covariates_char = c("age.brain", "sex", "slide")
   )
#> Phenotype data is not in the same order as methylation data. We used column Sample in phenotype data to put these two files in the same order.
#>             GSM1443279  GSM1443326  GSM1443389   GSM1443434  GSM1443475
#> cg00004192  0.58640646 -0.49767945 -0.06457134 -0.212505570  0.27933449
#> cg00004775 -0.21730195  0.49050104  0.15860444  0.576896507 -0.55293114
#> cg00012194  0.05144935 -0.12025395 -0.08577237 -0.174337410  0.25759590
#> cg00013618 -0.20931589 -0.41968875 -0.09221866  0.134788358  0.87188266
#> cg00014104 -0.19491518  0.01044988  0.05291720 -0.205485110  0.75532381
#> cg00014733  0.09079996  0.02703190 -0.13772317 -0.003066361 -0.07431340
#> cg00017461 -0.23998567  0.15939361  0.03531002 -0.024738248 -0.03561947
#> cg00021762 -0.43595886 -0.26037770  0.19643290 -0.082914671  0.08327474
#> cg00022145  0.06637352 -0.38586662 -0.60636590  0.355595751  0.11474364
#> cg00024416  0.25931864 -0.20265774 -0.35289785  0.596480609 -0.86818331
#>              GSM1443547  GSM1443573  GSM1443577  GSM1443640  GSM1443663
#> cg00004192  0.063698756  0.06457134  0.15377362 -0.24765878 -0.12536953
#> cg00004775 -0.293663232 -0.15860444  0.22103453 -0.16411811 -0.06041765
#> cg00012194  0.149947780  0.08577237 -0.22311432 -0.00409128  0.06280393
#> cg00013618  0.128276611  0.09221866 -0.54441256  0.06534103 -0.02687146
#> cg00014104  0.037841256 -0.05291720 -0.71285649  0.10681760  0.20282424
#> cg00014733 -0.106690470  0.13772317 -0.09044167  0.22118393 -0.06450389
#> cg00017461 -0.012415153 -0.03531002 -0.08846412  0.16945040  0.07237864
#> cg00021762  0.083745738 -0.19643290  0.37353586  0.26066407 -0.02196918
#> cg00022145 -0.005477893  0.60636590 -0.33524293  0.52130004 -0.33142552
#> cg00024416  0.066775059  0.35289785  0.71794320 -0.17137536 -0.39830109