我目前有一些代码可以计算8个不同基因列表之间的重叠,以查看一次两个列表共有多少个基因。我想修改代码,以便获得相同的结果,但以百分比表示。请在下面找到代码: 最内部的循环需要根据以下内容进行更改: 1.获得g1和g2的最大值 2.用重叠除以最大值 3.计算%
我理解我需要采取的步骤,但是在修改代码方面却遇到了困难,因为容易出现错误消息。如果有人可以帮助,那就太好了!
filelist <- list(data.file1, data.file2, data.file3, data.file4, data.file5, data.file6, data.file7, data.file8)
all(sapply(filelist, file.exists))
# read files:
gene.lists <- lapply(filelist, function(f) {
scan(file=f, what=character())
})
# set up empty matrix
x <- (length(gene.lists))^2
x
y <- rep(NA, x)
mx <- matrix(y, ncol=length(gene.lists))
mx
row.names(mx) <- sapply(filelist, basename) %>% stringr::str_remove('.txt$')
colnames(mx) <- sapply(filelist, basename) %>% stringr::str_remove('.txt$')
mx
mx.overlap.count <- mx
# seq_along(gene.lists) # 1 2 3 4 5 6 7 8
for (i in seq_along(gene.lists)) {
g1 <- gene.lists[[i]]
for (j in seq_along(gene.lists)) {
g2 <- gene.lists[[j]]
a <- intersect(g1, g2)
b <- length(a)
mx.overlap.count[j,i] <- b
}
}
mx.overlap.count
View(mx.overlap.count)
#----------------------------------------------------------------------
# looking at gene overlaps in terms of %
#----------------------------------------------------------------------
# modify the code written above by adding the following to the innermost loop:
# get max of g1 and g2
# divide overlap by the max
# calculate the %
下面,您将看到用于尝试将值计算为百分比的代码-
# seq_along(gene.lists) # 1 2 3 4 5 6 7 8
mx.overlap <- mx
for (i in seq_along(gene.lists)) {
g1 <- gene.lists[[i]]
for (j in seq_along(gene.lists)) {
g2 <- gene.lists[[j]]
a <- intersect(g1, g2)
b <- length(a)
maxN = max(length(g1), length(g2))
mx.overlap[j,i]= 100* b / maxN
mx.overlap.count[j,i] <- b
}
}
mx.overlap.count
View(mx.overlap.count)
答案 0 :(得分:0)
我想使用类似的东西
for (i in seq_along(gene.lists)) {
g1 <- gene.lists[[i]]
for (j in seq_along(gene.lists)) {
g2 <- gene.lists[[j]]
a <- intersect(g1, g2)
b <- length(a)
maxN = max(length(g1),length(g2))
mx.overlap[j,i] = 100 * b / maxN
mx.overlap.count[j,i] <- b
}
}
应该工作。