Question

我有三个角色向量。清单1包含所有独立名称;列表2和3仅包含列表1中的名称子集。名称可以在列表2和3中多次出现。

list1 <- c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")
list2 <- c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim")    
list3 <- c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John")

我想在最后获得一个数据框，第一列包含列表1，然后第二列和第三列包含第一个列表中的名称出现在列表2和3中的次数。

Jane    2   0
Mike    0   2
Zach    0   1
Zoey    1   0
Mary    0   2
Joe 2   1
Sam 1   0
Eva 1   1
Chris   0   2
David   1   0
James   0   1
Kim 3   0
John    0   2

我知道如何在Excel中执行此操作，但我的list1有超过10,0000个条目，如果我在Excel中执行此操作，则速度非常慢。在R中有没有办法做到这一点？

Answer 1

这是使用data.table

的方法

list1 <- c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")
list2 <- c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim")    
list3 <- c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John")

library(data.table)

dt = data.table(list1)
dt[ , "row" := 1:.N ]

dt[ , "list2count" := sum(list1 == list2), by = row]
dt[ , "list3count" := sum(list1 == list3), by = row]

> dt
       list1 row list2count list3count
 1:     Jane   1          2          0
 2:  Michael   2          0          2
 3:     Zach   3          0          1
 4:     Zoey   4          1          0
 5:     Mary   5          0          2
 6:      Joe   6          2          1
 7: Samantha   7          1          0
 8:      Eva   8          1          2
 9:    Chris   9          0          2
10:    David  10          1          0
11:    James  11          0          1
12:      Kim  12          3          0
13:     John  13          0          2

Answer 2

使用dplyr：

list1 <-  c("Jane","Michael","Zach","Zoey","Mary","Joe","Samantha","Eva","Chris","David","James","Kim","John")

list2 <- data.frame(name = c("Jane","Jane","Zoey","Joe","Joe","Samantha","Eva","David","Kim","Kim","Kim"))

list3 <-data.frame(name = c("Michael","Michael","Zach","Mary","Mary","Joe","Eva","Eva","Chris","Chris","James","John","John"))



list2$listNumber <- rep("list2",length(list2))
list3$listNumber <- rep("list3",length(list3))

combList <- rbind(list2,list3)
library(dplyr)
combList%>% group_by(listNumber)%>% count(name)%>% filter( name %in% list1)

输出：

# A tibble: 15 x 3
   listNumber     name     n
        <chr>   <fctr> <int>
 1      list2    David     1
 2      list2      Eva     1
 3      list2     Jane     2
 4      list2      Joe     2
 5      list2      Kim     3
 6      list2 Samantha     1
 7      list2     Zoey     1
 8      list3      Eva     2
 9      list3      Joe     1
10      list3    Chris     2
11      list3    James     1
12      list3     John     2
13      list3     Mary     2
14      list3  Michael     2
15      list3     Zach     1

Answer 3

在基础R中，您可以使用factor，将级别设置为列表1的级别，然后使用table获取计数，并使用data.frame将它们全部放在一起：< / p>

data.frame(list1,
           l2=c(table(factor(list2, levels=list1))),
           l3=c(table(factor(list3, levels=list1))))

这回归

            list1 l2 l3
Jane         Jane  2  0
Michael   Michael  0  2
Zach         Zach  0  1
Zoey         Zoey  1  0
Mary         Mary  0  2
Joe           Joe  2  1
Samantha Samantha  1  0
Eva           Eva  1  2
Chris       Chris  0  2
David       David  1  0
James       James  0  1
Kim           Kim  3  0
John         John  0  2

Answer 4

这是一个可扩展到任意数量列表的基本解决方案

list0 <- list(list1, list2, list3)
Reduce(function(...) merge(..., by = 1, all = TRUE), 
        lapply(list0, function(x) as.data.frame(table(x))))
colnames(res) <- c("Name","L1","L2","L3")
res
#        Name     L1     L2   L3
# 1     Chris      1     NA    2
# 2     David      1      1   NA
# 3       Eva      1      1    2
# 4     James      1     NA    1
# 5      Jane      1      2   NA
# 6       Joe      1      2    1
# 7      John      1     NA    2
# 8       Kim      1      3   NA
# 9      Mary      1     NA    2
# 10  Michael      1     NA    2
# 11 Samantha      1      1   NA
# 12     Zach      1     NA    1
# 13     Zoey      1      1   NA

查找名称出现在另一个向量中的次数

4 个答案: