我有三个角色向量。清单1包含所有独立名称;列表2和3仅包含列表1中的名称子集。名称可以在列表2和3中多次出现。
list1 <- c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")
list2 <- c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim")
list3 <- c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John")
我想在最后获得一个数据框,第一列包含列表1,然后第二列和第三列包含第一个列表中的名称出现在列表2和3中的次数。
Jane 2 0
Mike 0 2
Zach 0 1
Zoey 1 0
Mary 0 2
Joe 2 1
Sam 1 0
Eva 1 1
Chris 0 2
David 1 0
James 0 1
Kim 3 0
John 0 2
我知道如何在Excel中执行此操作,但我的list1有超过10,0000个条目,如果我在Excel中执行此操作,则速度非常慢。在R中有没有办法做到这一点?
答案 0 :(得分:0)
这是使用data.table
的方法list1 <- c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")
list2 <- c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim")
list3 <- c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John")
library(data.table)
dt = data.table(list1)
dt[ , "row" := 1:.N ]
dt[ , "list2count" := sum(list1 == list2), by = row]
dt[ , "list3count" := sum(list1 == list3), by = row]
> dt
list1 row list2count list3count
1: Jane 1 2 0
2: Michael 2 0 2
3: Zach 3 0 1
4: Zoey 4 1 0
5: Mary 5 0 2
6: Joe 6 2 1
7: Samantha 7 1 0
8: Eva 8 1 2
9: Chris 9 0 2
10: David 10 1 0
11: James 11 0 1
12: Kim 12 3 0
13: John 13 0 2
答案 1 :(得分:0)
使用dplyr:
list1 <- c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")
list2 <- data.frame(name = c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim"))
list3 <-data.frame(name = c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John"))
list2$listNumber <- rep("list2",length(list2))
list3$listNumber <- rep("list3",length(list3))
combList <- rbind(list2,list3)
library(dplyr)
combList%>% group_by(listNumber)%>% count(name)%>% filter( name %in% list1)
输出:
# A tibble: 15 x 3
listNumber name n
<chr> <fctr> <int>
1 list2 David 1
2 list2 Eva 1
3 list2 Jane 2
4 list2 Joe 2
5 list2 Kim 3
6 list2 Samantha 1
7 list2 Zoey 1
8 list3 Eva 2
9 list3 Joe 1
10 list3 Chris 2
11 list3 James 1
12 list3 John 2
13 list3 Mary 2
14 list3 Michael 2
15 list3 Zach 1
答案 2 :(得分:0)
在基础R中,您可以使用factor
,将级别设置为列表1的级别,然后使用table
获取计数,并使用data.frame
将它们全部放在一起:< / p>
data.frame(list1,
l2=c(table(factor(list2, levels=list1))),
l3=c(table(factor(list3, levels=list1))))
这回归
list1 l2 l3
Jane Jane 2 0
Michael Michael 0 2
Zach Zach 0 1
Zoey Zoey 1 0
Mary Mary 0 2
Joe Joe 2 1
Samantha Samantha 1 0
Eva Eva 1 2
Chris Chris 0 2
David David 1 0
James James 0 1
Kim Kim 3 0
John John 0 2
答案 3 :(得分:0)
这是一个可扩展到任意数量列表的基本解决方案
list0 <- list(list1, list2, list3)
Reduce(function(...) merge(..., by = 1, all = TRUE),
lapply(list0, function(x) as.data.frame(table(x))))
colnames(res) <- c("Name","L1","L2","L3")
res
# Name L1 L2 L3
# 1 Chris 1 NA 2
# 2 David 1 1 NA
# 3 Eva 1 1 2
# 4 James 1 NA 1
# 5 Jane 1 2 NA
# 6 Joe 1 2 1
# 7 John 1 NA 2
# 8 Kim 1 3 NA
# 9 Mary 1 NA 2
# 10 Michael 1 NA 2
# 11 Samantha 1 1 NA
# 12 Zach 1 NA 1
# 13 Zoey 1 1 NA