计算其他数据框中的重复项数量

时间:2018-12-04 11:34:15

标签: r dataframe count duplicates

我有两个data.frames dfAdfB。他们两个都有一个名为key的列。 现在,我想知道B $ key中有多少A $ key重复项。

A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))

应为A = 2,B = 3,C = 0和D = 1。最简单的方法是什么?

4 个答案:

答案 0 :(得分:4)

使用table

table(factor(B$key, levels = sort(unique(A$key))))
#A B C D 
#2 3 0 1
在这里需要

factor,这样我们也可以“计数”未出现在B$key中的条目,即C

答案 1 :(得分:2)

您可以使用tidyverse

A %>%
 left_join(B %>% #Merging df A with df B for which the count in "key" was calculated
            group_by(key) %>%
            tally(), by = c("key" = "key")) %>%
 mutate(n = ifelse(is.na(n), 0, n)) #Replacing NA with 0

  key n
1   A 2
2   B 3
3   C 0
4   D 1

答案 2 :(得分:2)

A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))

library(dplyr)
library(tidyr)

B %>%
  filter(key %in% A$key) %>%                 # keep values that appear in A
  count(key) %>%                             # count values
  complete(key = A$key, fill = list(n = 0))  # add any values from A that don't appear

# # A tibble: 4 x 2
#   key       n
#   <chr> <dbl>
# 1 A         2
# 2 B         3
# 3 C         0
# 4 D         1

答案 3 :(得分:1)

实际上,您是指在A$key中每个B$key值出现多少次?

您可以通过将B$key的唯一值作为级别编码为因子来A$key

o <- table(factor(B$key, levels=unique(A$key)))

产量:

> o

A B C D 
2 3 0 1 

如果您真的想计算重复次数,请

dupes <- ifelse(o - 1 < 0, 0, o - 1)

产量:

> dupes

A B C D 
1 2 0 0