Extract clusters of CpGs located closely in a genomic region.

CloseBySingleRegion(
  CpGs_char,
  genome = c("hg19", "hg38"),
  arrayType = c("450k", "EPIC"),
  manifest_gr = NULL,
  maxGap = 200,
  minCpGs = 3
)

Arguments

CpGs_char

a list of CpG IDs

genome

Human genome of reference hg19 or hg38

arrayType

Type of array, 450k or EPIC

manifest_gr

A GRanges object with the genome manifest (as returned by ExperimentHub or by ImportSesameData). This function by default ignores this argument in favour of the genome and arrayType arguments.

maxGap

an integer, genomic locations within maxGap from each other are placed into the same cluster

minCpGs

an integer, minimum number of CpGs for the resulting CpG cluster

Value

a list, each item in the list is a character vector of CpG IDs located closely (i.e. in the same cluster)

Details

Note that this function depends only on CpG locations, and not on any methylation data. The algorithm is based on the clusterMaker function in the bumphunter R package. Each cluster is essentially a group of CpG locations such that two consecutive locations in the cluster are separated by less than maxGap.

Examples


   CpGs_char <- c(
     "cg02505293", "cg03618257", "cg04421269", "cg17885402", "cg19890033",
     "cg20566587", "cg27505880"
   )

   cluster_ls <- CloseBySingleRegion(
     CpGs_char,
     genome = "hg19",
     arrayType = "450k",
     maxGap = 100,
     minCpGs = 3
   )
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation