Linear Mixed Models by Region — lmmTestAllRegions • coMethDMR

Fit mixed model to test association between a continuous phenotype and methylation values in a list of genomic regions

lmmTestAllRegions(
  betas,
  region_ls,
  pheno_df,
  contPheno_char,
  covariates_char,
  modelType = c("randCoef", "simple"),
  genome = c("hg19", "hg38"),
  arrayType = c("450k", "EPIC"),
  ignoreStrand = TRUE,
  outFile = NULL,
  outLogFile = NULL,
  nCores_int = 1L,
  ...
)

Arguments

betas: data frame or matrix of beta values for all genomic regions, with row names = CpG IDs and column names = sample IDs. This is often the genome-wide array data.
region_ls: a list of genomic regions; each item is a vector of CpG IDs within a genomic region. The co-methylated regions can be obtained by function CoMethAllRegions.
pheno_df: a data frame with phenotype and covariates, with variable Sample indicating sample IDs.
contPheno_char: character string of the main effect (a continuous phenotype) to be tested for association with methylation values in each region
covariates_char: character vector for names of the covariate variables
modelType: type of mixed model; can be randCoef for random coefficient mixed model or simple for simple linear mixed model.
genome: Human genome of reference: hg19 or hg38
arrayType: Type of array: "450k" or "EPIC"
ignoreStrand: Whether strand can be ignored, default is TRUE
outFile: output .csv file with the results for the mixed model analysis
outLogFile: log file for mixed models analysis messages
nCores_int: Number of computing cores to be used when executing code in parallel. Defaults to 1 (serial computing).
...: Dots for additional arguments passed to the cluster constructor. See CreateParallelWorkers for more information.

Value

If outFile is NULL, this function returns a data frame as described below. If outFile is specified, this function writes a data frame in .csv format with the following information to the disk: location of the genomic region (chrom, start, end), number of CpGs (nCpGs), Estimate, Standard error (StdErr) of the test statistic, p-value and False Discovery Rate (FDR) for association between methylation values in each genomic region with phenotype (pValue). If outLogFile is specified, this function also writes a .txt file that includes messages for mixed model fitting to the disk.

Details

This function implements a mixed model to test association between methylation M values in a genomic region with a continuous phenotype. In our simulation studies, we found both models shown below are conservative, so p-values are estimated from normal distributions instead of Student's t distributions.

When modelType = "randCoef", the model is:

M ~ contPheno_char + covariates_char + (1|Sample) + (contPheno_char|CpG).

The last term specifies random intercept and slope for each CpG. When modelType = "simple", the model is

M ~ contPheno_char + covariates_char + (1|Sample).

For the results of mixed models, note that if the mixed model failed to converge, p-value for mixed model is set to 1. Also, if the mixed model is singular, at least one of the estimated variance components for intercepts or slopes random effects is 0, because there isn't enough variability in the data to estimate the random effects. In this case, the mixed model reduces to a fixed effects model. The p-values for these regions are still valid.

Examples

   data(betasChr22_df)
   data(pheno_df)

   CpGisland_ls <- readRDS(
     system.file(
       "extdata",
       "CpGislandsChr22_ex.rds",
       package = 'coMethDMR',
       mustWork = TRUE
     )
   )

   coMeth_ls <- CoMethAllRegions(
     dnam = betasChr22_df,
     betaToM = TRUE,
     CpGs_ls = CpGisland_ls,
     arrayType = "450k",
     rDropThresh_num = 0.4,
     returnAllCpGs = FALSE
   )
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation


   results_df <- lmmTestAllRegions(
     betas = betasChr22_df,
     region_ls = coMeth_ls,
     pheno_df = pheno_df,
     contPheno_char = "stage",
     covariates_char = "age.brain",
     modelType = "randCoef",
     arrayType = "450k",
     ignoreStrand = TRUE,
     # generates a log file in the current directory
     # outLogFile = paste0("lmmLog_", Sys.Date(), ".txt")
   )
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation
#> Analyzing region chr22:18268062-18268249. 
#> Analyzing region chr22:18324579-18324769. 
#> Analyzing region chr22:18531243-18531447. 
#> For future calls to this function, perhaps specify a log file.
#>       Set the file name of the log file with the outLogFile argument.