Extract contiguous co-methylated genomic regions from a list of pre-defined genomic regions

CoMethAllRegions(
  dnam,
  betaToM = FALSE,
  method = c("pearson", "spearman"),
  rDropThresh_num = 0.4,
  minCpGs = 3,
  genome = c("hg19", "hg38"),
  arrayType = c("450k", "EPIC"),
  CpGs_ls,
  file = NULL,
  returnAllCpGs = FALSE,
  output = c("CpGs", "dataframe"),
  nCores_int = 1L,
  ...
)

Arguments

dnam

matrix (or data frame) of beta values, with row names = CpG IDs, column names = sample IDs. This is typically genome-wide methylation beta values.

betaToM

indicates if converting methylation beta values to mvalues

method

method for computing correlation, can be "spearman" or "pearson"

rDropThresh_num

threshold for min correlation between a cpg with sum of the rest of the CpGs

minCpGs

minimum number of CpGs to be considered a "region". Only regions with more than minCpGs will be returned.

genome

Human genome of reference hg19 or hg38

arrayType

Type of array, can be "450k" or "EPIC"

CpGs_ls

list where each item is a character vector of CpGs IDs. This should be CpG probes located closely on the array.

file

an RDS file with clusters of CpG locations (i.e. CpGs located closely to each other on the genome). This file can be generated by the WriteCloseByAllRegions function.

returnAllCpGs

When there is not a contiguous comethylated region in the inputting pre-defined region, returnAllCpGs = TRUE indicates outputting all the CpGs in the input regions (regardless of statistical significance), while returnAllCpGs = FALSE indicates not returning any CpGs not contained in comethylated clusters. Defaults to FALSE, and we provide this option for debugging purposes only.

output

a character vector of CpGs or a dataframe of CpGs along with rDrop info

nCores_int

Number of computing cores to be used when executing code in parallel. Defaults to 1 (serial computing).

...

Dots for additional arguments passed to the cluster constructor. See CreateParallelWorkers for more information.

Value

When output = "dataframe" is selected, returns a list of data frames, each with CpG (CpG name), Chr (chromosome number), MAPINFO (genomic position), r_drop (correlation between the CpG with rest of the CpGs), keep (indicator for co-methylated CpG), keep_contiguous (index for contiguous comethylated subregions). When output = "CpGs" is selected, returns a list, each item is a list of CpGs in the contiguous co-methylated subregion.

Details

There are two ways to input genomic regions for this function: (1) use CpGs_ls argument, or (2) use file argument. Examples of these files are in /inst/extdata/ folder of the package.

Examples

   data(betasChr22_df)


   CpGisland_ls <- readRDS(
     system.file(
       "extdata",
       "CpGislandsChr22_ex.rds",
       package = 'coMethDMR',
       mustWork = TRUE
     )
   )

   coMeth_ls <- CoMethAllRegions (
     dnam = betasChr22_df,
     betaToM = TRUE,
     method = "pearson",
     CpGs_ls = CpGisland_ls,
     arrayType = "450k",
     returnAllCpGs = FALSE,
     output = "CpGs"
   )
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation