我有一个data.table DT,我想使用另一列(月)的最大值按一列(年)进行汇总。这是我的data.table的示例。
> DT <- data.table(month = c("2016-01", "2016-02", "2016-03", "2017-01", "2017-02", "2017-03")
, col1 = c(3,5,2,8,4,9)
, year = c(2016, 2016,2016, 2017,2017,2017))
> DT
month col1 year
1: 2016-01 3 2016
2: 2016-02 5 2016
3: 2016-03 2 2016
4: 2017-01 8 2017
5: 2017-02 4 2017
6: 2017-03 9 2017
所需的输出
> ## desired output
> DT
month col1 year desired_output
1: 2016-01 3 2016 2
2: 2016-02 5 2016 2
3: 2016-03 2 2016 2
4: 2017-01 8 2017 9
5: 2017-02 4 2017 9
6: 2017-03 9 2017 9
按年份汇总,期望的输出应为最近一个月的col1值。但是以下代码无法正常工作,它向我发出警告并返回NA。我在做什么错了?
> ## wrong output
> DT[, output := col1[which.max(month)], by = .(year)]
Warning messages:
1: In which.max(month) : NAs introduced by coercion
2: In which.max(month) : NAs introduced by coercion
> DT
month col1 year output
1: 2016-01 3 2016 NA
2: 2016-02 5 2016 NA
3: 2016-03 2 2016 NA
4: 2017-01 8 2017 NA
5: 2017-02 4 2017 NA
6: 2017-03 9 2017 NA
答案 0 :(得分:1)
我们通过从yearmon
转换为zoo
类来获取'month中最大值的索引,并在创建按如下分组的'desired_output'列时使用它从'col1'中获取相应的值'年'
library(zoo)
library(data.table)
DT[, desired_output := col1[which.max(as.yearmon(month))], .(year)]
DT
# month col1 year desired_output
#1: 2016-01 3 2016 2
#2: 2016-02 5 2016 2
#3: 2016-03 2 2016 2
#4: 2017-01 8 2017 9
#5: 2017-02 4 2017 9
#6: 2017-03 9 2017 9
或提取“月”并获取max
值的索引
DT[, desired_output := col1[which.max(month(as.IDate(paste0(month,
"-01"))))], .(year)]