R/MarkComethylatedCpGs.R
MarkComethylatedCpGs.Rd
Mark CpGs in contiguous and co-methylated region
MarkComethylatedCpGs(
betaCluster_mat,
betaToM = TRUE,
epsilon = 1e-08,
rDropThresh_num = 0.4,
method = c("pearson", "spearman"),
use = "complete.obs"
)
matrix of beta values, with rownames = sample ids and
column names = CpG ids. Note that the CpGs need to be ordered by their
genomic positions, this can be accomplished by the
OrderCpGbyLocation
function.
indicates if beta values should be converted to M values before computing correlations. Defaults to TRUE.
When transforming beta values to M values, what should be done
to values exactly equal to 0 or 1? The M value transformation would yield
-Inf
or Inf
which causes issues in the statistical model. We
thus replace all values exactly equal to 0 with 0 + epsilon
, and
we replace all values exactly equal to 1 with 1 - epsilon
. Defaults
to epsilon = 1e-08
.
threshold for minimum correlation between a cpg with the rest of the CpGs. Defaults to 0.4.
correlation method; can be "pearson" or "spearman"
method for handling missing values when calculating the
correlation. Defaults to "complete.obs"
because the option
"pairwise.complete.obs"
only works for Pearson correlation.
A data frame with the following columns:
CpG
: CpG ID
keep
: The CpGs with keep = 1
belong to the
contiguous and co-methylated region
ind
: Index for the CpGs
r_drop
: The correlation between each CpG with the sum of
the rest of the CpGs
An outlier CpG in a genomic region will typically have low
correlation with the rest of the CpGs in a genomic region. On the other
hand, in a cluster of co-methylated CpGs, we expect each CpG to have high
correlation with the rest of the CpGs. The r.drop
statistic is used
to identify these co-methylated CpGs here.
data(betaMatrix_ex1)
MarkComethylatedCpGs(
betaCluster_mat = betaMatrix_ex1,
betaToM = FALSE,
method = "pearson"
)
#> CpG keep ind r_drop
#> 1 cg10170214 1 1 0.7359922
#> 2 cg06518233 1 2 0.8936653
#> 3 cg18326783 1 3 0.8901418
#> 4 cg05229649 0 4 0.1862278