我有两个data.frames dfA
和dfB
。他们两个都有一个名为key
的列。
现在,我想知道B $ key中有多少A $ key重复项。
A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))
应为A = 2,B = 3,C = 0和D = 1。最简单的方法是什么?
答案 0 :(得分:4)
使用table
table(factor(B$key, levels = sort(unique(A$key))))
#A B C D
#2 3 0 1
在这里需要 factor
,这样我们也可以“计数”未出现在B$key
中的条目,即C
。
答案 1 :(得分:2)
您可以使用tidyverse
:
A %>%
left_join(B %>% #Merging df A with df B for which the count in "key" was calculated
group_by(key) %>%
tally(), by = c("key" = "key")) %>%
mutate(n = ifelse(is.na(n), 0, n)) #Replacing NA with 0
key n
1 A 2
2 B 3
3 C 0
4 D 1
答案 2 :(得分:2)
A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))
library(dplyr)
library(tidyr)
B %>%
filter(key %in% A$key) %>% # keep values that appear in A
count(key) %>% # count values
complete(key = A$key, fill = list(n = 0)) # add any values from A that don't appear
# # A tibble: 4 x 2
# key n
# <chr> <dbl>
# 1 A 2
# 2 B 3
# 3 C 0
# 4 D 1
答案 3 :(得分:1)
实际上,您是指在A$key
中每个B$key
值出现多少次?
您可以通过将B$key
的唯一值作为级别编码为因子来A$key
。
o <- table(factor(B$key, levels=unique(A$key)))
产量:
> o
A B C D
2 3 0 1
如果您真的想计算重复次数,请
dupes <- ifelse(o - 1 < 0, 0, o - 1)
产量:
> dupes
A B C D
1 2 0 0