在R中,我想做以下事情:
我有一个包含5个数据帧的gene.list,其中每个数据框如下所示:
col1
name1
name2
name3
...
首先,我想提取这五个数据帧的重叠。输出必须是新的数据帧:输出
我有另一个列表,名为coverage.list,包含11个数据帧。每个数据框都如下所示
col1 col2 col3
name1-a 1 2
name2-c 3 4
name3-d 5 6
name4-e 7 8
现在,从coverage.list中的每个数据框,我想提取col1中的值以在上一步中创建的新输出数据帧中存在的值开始的行。输出应该是一个名为coverage.new.list
的新列表第一步:提取5个数据帧的重叠,我试图使用
Reduce(intersect, coverage.list))
但我得到一条消息'数据框有0列和0行'。但是,当我在此列表中使用venn函数时,我得到了正确的重叠计数
你能指出我正确的解决方案吗?
答案 0 :(得分:1)
我认为这就是你要找的东西
library(dplyr)
library(tidyr)
# Inner join on the gene.list tables. Inner join gene.list[[1]] with gene.list[[2]] then
# inner join the result with gene.list[[3]] then inner join
# then inner join with gene.list[[4]] then with gene.list[[5]]
output <- inner_join(gene.list[[1]], gene.list[[2]]) %>% inner_join(gene.list[[3]]) %>%
inner_join(gene.list[[4]]) %>% inner_join(gene.list[[5]])
coverage.list.new <- lapply(coverage.list, function(x) {x %>% mutate(backup=col1) %>%
separate(col1, c("col1", "col1_2"), sep="-") %>% filter(col1 %in% output$col1) %>%
mutate(col1=backup) %>% select(-c(backup, col1_2))})
<强> 更新 强>
coverage.list.new <- lapply(coverage.list, function(x) {x %>%
mutate(backup=col1, col1=sub("-", "@", col1)) %>%
separate(col1, c("col1", "col1_2"), sep="@") %>% filter(col1 %in% output$col1) %>%
mutate(col1=backup) %>% select(-c(backup, col1_2))})
# with col1=sub("-", "@", col1) in mutate i am substituting the first - with @
# in order to then split col1 by the @. If you have @ in your col1 to begin with
# then choose a symbol that does not exist in your col1 and replace
# in the code above the @ symbol with your chosen symbol.
示例数据
gene.list <- list(data.frame(col1=c("name1", "name2", "name3")),
data.frame(col1=c("name1", "name3", "name4")),
data.frame(col1=c("name1", "name3", "name4")),
data.frame(col1=c("name1", "name3", "name4")),
data.frame(col1=c("name1", "name3", "name4")))
coverage.list <- list(data.frame(col1=c("name1-a", "name2-c", "name3-d", "name4-e"),
col2=c(1, 3, 5, 7), col3=c(2, 4, 6, 8)),
data.frame(col1=c("name3-a", "name4-c", "name3-d", "name4-e"),
col2=c(1, 3, 5, 7), col3=c(2, 4, 6, 8)))