Test associations of individual CpGs in a genomic region with a continuous phenotype

CpGsInfoOneRegion(
  regionName_char,
  region_gr = NULL,
  betas_df,
  pheno_df,
  contPheno_char,
  covariates_char = NULL,
  genome = c("hg19", "hg38"),
  arrayType = c("450k", "EPIC"),
  manifest_gr = NULL
)

Arguments

regionName_char

character string of location information for a genomic region, specified in the format of "chrxx:xxxxxx-xxxxxx"

region_gr

An object of class GRanges with location information for one region. If this argument is NULL, then the region in regionName_char is used.

betas_df

data frame of beta values with row names = CpG IDs, column names = sample IDs

pheno_df

a data frame with phenotype and covariate variables, with variable "Sample" for sample IDs.

contPheno_char

character string of the continuous phenotype to be tested against methylation values

covariates_char

character vector of covariate variables names

genome

human genome of reference hg19 (default) or hg38

arrayType

Type of array, can be "450k" or "EPIC"

manifest_gr

A GRanges object with the genome manifest (as returned by ExperimentHub or by ImportSesameData). This function by default ignores this argument in favour of the genome and arrayType arguments.

Value

a data frame with location of the genomic region (Region), CpG ID (cpg), chromosome (chr), position (pos), results for testing association of methylation in individual CpGs with continuous phenotype (slopeEstimate, slopePval) and annotations for the region.

Details

This function implements linear models that test association between methylation values in a genomic region with a continuous phenotype. Note that methylation M values are used as regression outcomes in these models. The model for each CpG is:

methylation M value ~ contPheno_char + covariates_char

Examples

   data(betasChr22_df)
   data(pheno_df)
   myRegion_gr <- RegionsToRanges("chr22:18267969-18268249")

   CpGsInfoOneRegion(
     region_gr = myRegion_gr,
     betas_df = betasChr22_df,
     pheno_df = pheno_df,
     contPheno_char = "stage",
     covariates_char = c("age.brain", "sex"),
     arrayType = "450k"
   )
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation
#> snapshotDate(): 2021-05-18
#> see ?sesameData and browseVignettes('sesameData') for documentation
#>                    Region        cpg   chr      pos slopeEstimate slopePval
#> 1 chr22:18267969-18268249 cg18370151 chr22 18267969       -0.0078    0.6911
#> 2 chr22:18267969-18268249 cg12460175 chr22 18268062       -0.0393    0.3230
#> 3 chr22:18267969-18268249 cg14086922 chr22 18268239       -0.0779    0.0586
#> 4 chr22:18267969-18268249 cg21463605 chr22 18268249       -0.1005    0.0264
#>   UCSC_RefGene_Name UCSC_RefGene_Accession UCSC_RefGene_Group
#> 1                                                            
#> 2                                                            
#> 3                                                            
#> 4