Question

我正在尝试按组找到最常见的值。在以下示例数据框中：

df<-data.frame(a=c(1,1,1,1,2,2,2,3,3),b=c(2,2,1,2,3,3,1,1,2))  
> df  
  a b  
1 1 2  
2 1 2  
3 1 1  
4 1 2  
5 2 3  
6 2 3  
7 2 1  
8 3 1  
9 3 2

我想添加一栏＆＃39; c＆＃39;在＆＃39; b＆＃39;中具有最大价值的当其值按＆＃39; a＆＃39;分组时。我想要以下输出：

我尝试使用表格和tapply，但没有做对。有没有快速的方法呢？
谢谢！

Answer 1

在Davids评论的基础上，您的解决方案如下：

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))

请注意，df$a 3为b时，1的模式为{{1}}。

Answer 2

我们可以获得＆＃39;模式＆＃39; ＆＃39; b＆＃39;按＆＃39; a＆＃39;分组使用ave

 Mode <- function(x) {
 ux <- unique(x)
 ux[which.max(tabulate(match(x, ux)))]
}

df$c <-  with(df, ave(b, a, FUN=Mode))
df$c
#[1] 2 2 2 2 3 3 3 1 1

或使用data.table

library(data.table)
setDT(df)[, c:= Mode(b), by=a][]

Answer 3

以下是一个基础R方法，使用table计算交叉表，max.col查找每个模式，rep和rle一起填写跨组的模式。

# calculate a cross tab, frequencies by group
myTab <- table(df$a, df$b)
# repeat the mode for each group, as calculated by colnames(myTab)[max.col(myTab)] 
# repeating by the number of times the group ID is observed
df$c <- rep(colnames(myTab)[max.col(myTab)], rle(df$a)$length)

df
  a b c
1 1 2 2
2 1 2 2
3 1 1 2
4 1 2 2
5 2 3 3
6 2 3 3
7 2 1 3
8 3 1 2
9 3 2 2

请注意，这假定数据已按组排序。此外，max.col的默认值是随机打破关系（多种模式）。如果要将第一个或最后一个值作为模式，可以使用ties.method参数进行设置。

最常见的值（模式）按组

3 个答案: