以下是我想通过可重现的示例实现的示例。
我有data.table
个月份作为时间ID。我想对过去5年,过去10年等数据进行一些计算,直到最后一个月。 (即最后5*12
个月,过去10*12
个月等)
我有办法做到这一点,但我怀疑它经历了许多不必要的中间变量。
library(lubridate) #For easy creation of time-series
library(data.table)
set.seed(5)
DT <- data.table(
Month = as.Date(sapply(0:329, function(i)(as.Date('1990-01-01')%m+%months(i))), origin = '1970-01-01'),
Value = round(runif(330, min = 20, max = 40), digits = 2)
)
> DT
Month Value
1: 1990-01-01 24.00
2: 1990-02-01 33.70
3: 1990-03-01 38.34
4: 1990-04-01 25.69
5: 1990-05-01 22.09
---
326: 2017-02-01 20.91
327: 2017-03-01 38.96
328: 2017-04-01 28.91
329: 2017-05-01 26.09
330: 2017-06-01 35.16
## Create a vector of the first months marking the start of the 60 or 120 month period
last.month <- max(DT[['Month']])
first.months <- as.Date(sapply(seq(5, 25, by = 5), function(i)(last.month
%m-% months(i*12 - 1))), origin = '1970-01-01')
## Construction of table of interest
yrs <- paste0(seq(5, 25, by = 5), 'Yrs')
features <- data.table(
Period = factor(yrs, levels = yrs), Feature.1 = as.numeric(NA),
Feature.2 = as.numeric(NA)
)
for(i in 1:nrow(features)){
DT_n <- DT[Month>=first.months[i], ]
set(features, i, 'Feature.1', DT_n[, mean(Value)]) #mean used as an example operation
set(features, i, 'Feature.2', DT_n[, var(Value)]) #var used as an example operation
}
最后,这是我感兴趣的表 -
> features
Period Feature.1 Feature.2
1: 5Yrs 29.68817 35.80375
2: 10Yrs 29.25542 39.50981
3: 15Yrs 29.64950 37.41900
4: 20Yrs 29.63454 34.51793
5: 25Yrs 29.84373 35.90916
data.table
用于实现此目标的最佳方法是什么?在不必要的变量减少或效率方面的任何改进都值得赞赏。
谢谢!
答案 0 :(得分:1)
这是您可以试用的另一种data.table方法。构建first.months
和yrs
向量后,您可以将它们放入单独的data.table中:
m <- data.table(firstmonths = first.months, yrs = yrs, key = "yrs")
然后使用非equi连接来计算结果:
rbindlist(lapply(yrs, function(y) {
DT[m[y], on = .(Month >= firstmonths), .(mean = mean(Value),
var = var(Value),
Period = y)]
}))
# mean var Period
#1: 29.68817 35.80375 5Yrs
#2: 29.25542 39.50981 10Yrs
#3: 29.64950 37.41900 15Yrs
#4: 29.63454 34.51793 20Yrs
#5: 29.84373 35.90916 25Yrs
答案 1 :(得分:1)
另一种方法:
rbindlist(lapply(first.months,
function(m) data.table(val_mean = mean(DT[Month >= m]$Value),
val_var = var(DT[Month >= m]$Value)))
)[, Period := yrs][]
给出:
val_mean val_var Period 1: 29.68817 35.80375 5Yrs 2: 29.25542 39.50981 10Yrs 3: 29.64950 37.41900 15Yrs 4: 29.63454 34.51793 20Yrs 5: 29.84373 35.90916 25Yrs
或上述方法的变体setNames
和idcol
- rbindlist
的参数:
rbindlist(setNames(lapply(first.months,
function(m) data.table(val_mean = mean(DT$Value[DT$Month >= m]),
val_var = var(DT$Value[DT$Month >= m]))),
yrs),
idcol = 'Period')
给出:
Period val_mean val_var 1: 5Yrs 29.68817 35.80375 2: 10Yrs 29.25542 39.50981 3: 15Yrs 29.64950 37.41900 4: 20Yrs 29.63454 34.51793 5: 25Yrs 29.84373 35.90916