Question

说我有一个像这样的数据集：

 id <- c(1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
 foo <- c('a', 'b', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'a', 'a')
 dat <- data.frame(id, foo)

即

对于每个id，我如何获得foo值的最大重复次数

即

   id  max_repeat
1   1   1
2   2   3
3   3   2

例如，id 2的max_repeat为3，因为其中一个值foo（b）重复了3次。

Answer 1

使用tidyverse：

dat %>%
 group_by(id, foo) %>% #Grouping by id and foo
 tally() %>% #Calculating the count
 group_by(id) %>%
 summarise(res = max(n)) #Keeping the max count per id

     id   res
  <dbl> <dbl>
1    1.    1.
2    2.    3.
3    3.    2.

Answer 2

dplyr

library(tidyverse)

dat %>% 
  group_by(id) %>% 
  summarise(max_repeat = max(tabulate(foo)))

# # A tibble: 3 x 2
#      id max_repeat
#   <dbl>      <int>
# 1     1          1
# 2     2          3
# 3     3          2

data.table

library(data.table)
setDT(dat)

dat[, .(max_repeat = max(tabulate(foo))), by = id]

#    id max_repeat
# 1:  1          1
# 2:  2          3
# 3:  3          2

base（如果需要，可以使用setNames更改名称）

aggregate(foo ~ id, dat, function(x) max(tabulate(x)))
#   id foo
# 1  1   1
# 2  2   3
# 3  3   2

Answer 3

没有软件包，您可以组合两个aggregate()，一个具有长度，一个具有最大长度。

x1 <- with(dat, aggregate(list(count=id), list(id=id, foo=foo), FUN=length))
x2 <- with(x1, aggregate(list(max_repeat=count), list(id=id), FUN=max))

产量：

> x2
  id max_repeat
1  1          1
2  2          3
3  3          2

数据：

dat <- structure(list(id = c(1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3), foo = structure(c(1L, 
2L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 1L, 1L), .Label = c("a", "b", 
"c"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

R-分组后，如何获得重复值的最大次数？

3 个答案: