下面显示的代码用于计算不同基因列表的超几何分布。但是,出现错误,表明二进制运算符有一个非数字参数。有什么想法需要改变吗?
#-------------------------------------------------------------------------------
# Hypergeometric p values
#-------------------------------------------------------------------------------
# Set up empty matrix
# ....
hypergeo <- function(white.drawn, white, black, drawn, do.log=FALSE) {
# Info: http://digitheadslabnotebook.blogspot.com/2011/02/using-r-for-introductory-statistics_21.html
# dhyper(q, m, n, k, log = FALSE)
# q = number of successes; "white balls drawn" (here: number of genes that overlap)
# m + n = N ; N = total number of genes
# m = "white balls in urn"; total number of TF-bound genes
# n = "black balls in urn"; total number of genes NOT bound by the TF
# k = "number of balls drawn from urn"; sample size
if (white < 1) {return(NA)}
p <- phyper(white.drawn-1, white, black, drawn, lower.tail = FALSE, log.p=do.log)
return(p)
} # end: hypergeo
y <- rep(NA, x)
mx.p <- matrix(y, ncol=length(gene.lists))
mx.p
row.names(mx.p) <- sapply(filelist, basename) %>% stringr::str_remove('.txt$')
colnames(mx.p) <- sapply(filelist, basename) %>% stringr::str_remove('.txt$')
mx.p
#-------------------------------------------------------------------------------
# loop to work our hypergeometric distribution
#-------------------------------------------------------------------------------
for (i in seq_along(gene.lists)) {
g1 <- gene.lists[[i]]
for (j in seq_along(gene.lists)) {
g2 <- gene.lists[[j]]
a <- intersect(g1,g2)
b <- length(a)
balls.white <- length(g1)
balls.black <- 31253 - length(g1)
balls.white.drawn <- length(intersect(g1,g2))
balls.drawn <- length (g2)
balls.total <- 31253
p <- hypergeo(white.drawn = balls.white.drawn,
white = balls.white,
black = balls.black,
drawn =balls.drawn, do.log = FALSE)
}
}
答案 0 :(得分:0)
在代码中,您定义balls.white.drawn <- intersect(g1, g2)
。这两个集合的交集通常不是单个数字,而是一个向量(在您的情况下是基因的向量)。我的感觉是,您希望在g1和g2的交点中的元素的 number ,因此,您想要balls.white.drawn <- length(intersect(g1, g2))
,尽管这已经是您要定义的a。