使用自定义索引展平data.table(data.frame)

时间:2019-01-11 10:40:02

标签: r data.table

有没有一种简单的方法可以通过保留自定义索引来展平data.table。

问题陈述: 我有一个数据库,其中数据在5个不同的列中每5分钟上传一次分钟。我最终想使用na.approx来插值缺失的值,但是我必须想出一种使用正确的datetime实例来平整data.table的方法。

示例数据:

data <- data.frame(datetime = c("2018-01-01 10:00:00", 
                            "2018-01-01 10:05:00", 
                            "2018-01-01 10:10:00", 
                            "2018-01-01 18:00:00", 
                            "2018-01-01 18:05:00"),
               value_1 = c(0, 45, NA, NA, 170),
               value_2 = c(10, 50, 70, 130, 175),
               value_3 = c(20, 60, 85, 135, 180),
               value_4 = c(30, NA, 95, 150, 190),
               value_5 = c(30, 70, 110, 160, 200)
               ) %>% data.table()
data$datetime <- as.POSIXct(data$datetime)

现在对value_1进行插值可以得出:

na.approx(data$value_1, x = data$datetime)
[1]   0.00000  45.00000  46.30208 168.69792 170.00000

我希望能看到:c(0,45,70,120,170)

输出:

required output of flattening

我想出了一个解决方案,但是它并不整洁:

times <- c(data$datetime + 60, data$datetime + 120, data$datetime + 180, data$datetime + 180, data$datetime + 240)
test <- flatten(data[, -c("datetime")])
data.table(datetime = times, values = test)

有人知道如何做得更好吗?

1 个答案:

答案 0 :(得分:3)

可以吗?

library(data.table)
library(zoo)
dt <- as.data.table(data)
increment <- 60 # seconds
# wide to long
dt_long <- melt(dt, id.vars = "datetime")
# add increments to datetime, retaining values for each set of datetime
dt_frame <- dt_long[, .(new_datetime = seq(datetime[1], datetime[1]+(increment*(.N-1)), by=increment),
                        value = value), 
               by=datetime]
dt_frame[, value2 := na.approx(value, new_datetime)]

# additions, keep original datetime, and cast back to wide format
dt_frame[, i := 1:.N, by = datetime]
out <- dcast(dt_frame, datetime~i, value.var="value2")
rename_these <- setdiff(names(out), "datetime")
setnames(out, rename_these, sprintf("value_%s", rename_these))
out[]