Question

这应该非常简单，但我没有设法解决这个问题。我想获得每组的最大值，我的工作如下。

ddply(dd,~group,summarise,max=max(value))

但是除了返回值和组之外，我想返回值，组和另一列，日期，索引在下面（显然不起作用）。我该怎么做？感谢。

ddply(dd,~group,summarise,max=max(value))['date']

Answer 1

如果您的日期与具有最大值的行相对应，请尝试使用subset获取最大行以及select以获取您所追求的列

# reproducible example using `iris`

# your original
ddply(iris, ~Species, summarise, max=max(Sepal.Length))
#      Species max
# 1     setosa 5.8
# 2 versicolor 7.0
# 3  virginica 7.9


# now we want to get the Sepal.Width that corresponds to max sepal.length too.
ddply(iris, ~Species, subset, Sepal.Length==max(Sepal.Length),
      select=c('Species', 'Sepal.Length', 'Sepal.Width'))
#      Species Sepal.Length Sepal.Width
# 1     setosa          5.8         4.0
# 2 versicolor          7.0         3.2
# 3  virginica          7.9         3.8

（或者不是在select来电中使用subset，而是在[, c('columns', 'I', 'want')]之后使用ddply。如果同一物种的多行达到最大值，则会返回所有行。

您也可以使用summarise来执行此操作，只需在通话中添加date定义，但效率稍低（计算最多两次）：

ddply(iris, ~Species, summarise,
      max=max(Sepal.Length),
      width=Sepal.Width[which.max(Sepal.Length)])

每个物种只返回一行，如果有多个花朵，其物种的最大萼片长度，则仅返回第一个（which.max返回第一个匹配的索引）。

Answer 2

如果我们使用data.table（使用iris数据集），我们将data.frame转换为data.table，按分组变量（'Species'）分组，我们得到的索引max一个变量的值（'Sepal.Length'）并使用它来对.SDcols中指示的列进行子集化。

library(data.table)
dt <- as.data.table(iris)
dt[, .SD[which.max(Sepal.Length)]  , by = Species, 
                 .SDcols= c('Sepal.Length', 'Sepal.Width')]

在r中找到max的最大因子和索引

2 个答案: