Question

我想从列x1和x2中删除重复项，同时保持x3中的较高值。

DF：

预期结果：

x1  x2  x3 
 1   1   3
 2   2   5

我已经达到df [！duplicated（df [，c（1,2）]），]但它显示的是x3的最低值。我想获得最高的x3值。

提前谢谢。

Answer 1

你可以aggregate()，使用前两列进行分组

aggregate(x3 ~ x1 + x2, df, max)
#   x1 x2 x3
# 1  1  1  3
# 2  2  2  5

如果要在多个列中找到最大值，可以使用cbind()将变量添加到公式的左侧。例如，

aggregate(cbind(x3, x4, x5) ~ x1 + x2, df, max)

Answer 2

使用dplyr包：

library(dplyr)
df %>% group_by(x1,x2) %>% summarise(x3 = max(x3))

为清晰起见，您可以标题最大变量“maxOfx3”或类似名称。

修改：如果您有其他需要的最大变量，可以将它们包含在summarise()调用中：

df %>% group_by(x1,x2) %>% summarise(x3 = max(x3), x4 = max(x4), avg_of_x5 = mean(x5))等。

Answer 3

data.table的另一种选择：

library(data.table)
dt <- data.table(DF)

dt[,.SD[which.max(x3)],by=list(x1, x2)]

   x1 x2 x3
1:  1  1  3
2:  2  2  5