Mark CpGs in contiguous and co-methylated region

MarkComethylatedCpGs(
  betaCluster_mat,
  betaToM = TRUE,
  epsilon = 1e-08,
  rDropThresh_num = 0.4,
  method = c("pearson", "spearman"),
  use = "complete.obs"
)

Arguments

betaCluster_mat

matrix of beta values, with rownames = sample ids and column names = CpG ids. Note that the CpGs need to be ordered by their genomic positions, this can be accomplished by the OrderCpGbyLocation function.

betaToM

indicates if beta values should be converted to M values before computing correlations. Defaults to TRUE.

epsilon

When transforming beta values to M values, what should be done to values exactly equal to 0 or 1? The M value transformation would yield -Inf or Inf which causes issues in the statistical model. We thus replace all values exactly equal to 0 with 0 + epsilon, and we replace all values exactly equal to 1 with 1 - epsilon. Defaults to epsilon = 1e-08.

rDropThresh_num

threshold for minimum correlation between a cpg with the rest of the CpGs. Defaults to 0.4.

method

correlation method; can be "pearson" or "spearman"

use

method for handling missing values when calculating the correlation. Defaults to "complete.obs" because the option "pairwise.complete.obs" only works for Pearson correlation.

Value

A data frame with the following columns:

  • CpG : CpG ID

  • keep : The CpGs with keep = 1 belong to the contiguous and co-methylated region

  • ind : Index for the CpGs

  • r_drop : The correlation between each CpG with the sum of the rest of the CpGs

Details

An outlier CpG in a genomic region will typically have low correlation with the rest of the CpGs in a genomic region. On the other hand, in a cluster of co-methylated CpGs, we expect each CpG to have high correlation with the rest of the CpGs. The r.drop statistic is used to identify these co-methylated CpGs here.

Examples

   data(betaMatrix_ex1)
   
   MarkComethylatedCpGs(
     betaCluster_mat = betaMatrix_ex1,
     betaToM = FALSE,
     method = "pearson"
   )
#>          CpG keep ind    r_drop
#> 1 cg10170214    1   1 0.7359922
#> 2 cg06518233    1   2 0.8936653
#> 3 cg18326783    1   3 0.8901418
#> 4 cg05229649    0   4 0.1862278