一列使用另一列的最大值进行data.table聚合-R

时间:2018-12-05 03:28:42

标签: r data.table max aggregate which

我有一个data.table DT,我想使用另一列(月)的最大值按一列(年)进行汇总。这是我的data.table的示例。

> DT <- data.table(month = c("2016-01", "2016-02", "2016-03", "2017-01", "2017-02", "2017-03")
                  , col1 = c(3,5,2,8,4,9)
                  , year = c(2016, 2016,2016, 2017,2017,2017))

> DT
     month col1 year
1: 2016-01    3 2016
2: 2016-02    5 2016
3: 2016-03    2 2016
4: 2017-01    8 2017
5: 2017-02    4 2017
6: 2017-03    9 2017

所需的输出

> ## desired output
    > DT
         month col1 year desired_output
    1: 2016-01    3 2016     2
    2: 2016-02    5 2016     2
    3: 2016-03    2 2016     2
    4: 2017-01    8 2017     9
    5: 2017-02    4 2017     9
    6: 2017-03    9 2017     9

按年份汇总,期望的输出应为最近一个月的col1值。但是以下代码无法正常工作,它向我发出警告并返回NA。我在做什么错了?

> ## wrong output
 > DT[, output := col1[which.max(month)], by = .(year)]
    Warning messages:
    1: In which.max(month) : NAs introduced by coercion
    2: In which.max(month) : NAs introduced by coercion
> DT
     month col1 year output
1: 2016-01    3 2016     NA
2: 2016-02    5 2016     NA
3: 2016-03    2 2016     NA
4: 2017-01    8 2017     NA
5: 2017-02    4 2017     NA
6: 2017-03    9 2017     NA

1 个答案:

答案 0 :(得分:1)

我们通过从yearmon转换为zoo类来获取'month中最大值的索引,并在创建按如下分组的'desired_output'列时使用它从'col1'中获取相应的值'年'

library(zoo)
library(data.table)
DT[, desired_output := col1[which.max(as.yearmon(month))], .(year)]
DT
#     month col1 year desired_output
#1: 2016-01    3 2016              2
#2: 2016-02    5 2016              2
#3: 2016-03    2 2016              2
#4: 2017-01    8 2017              9
#5: 2017-02    4 2017              9
#6: 2017-03    9 2017              9

或提取“月”并获取max值的索引

DT[, desired_output := col1[which.max(month(as.IDate(paste0(month,
                  "-01"))))], .(year)]