我正在尝试做一些具有特色的工程,但是我需要帮助才能按小时数来计算一些分类特征。
我尝试了下面的代码
但是,似乎仅用于计算过去一小时的分类变量,而我需要过去半小时和过去4小时。
df <- data.frame(c("1","2","3","4","5","6"),c("Attribute1", "Attribute1", "Attribute1", "Attribute2", "Attribute2", "Attribute1"),
c("2018-11-01 00:00:19", "2018-11-01 00:00:54", "2018-11-01 00:01:17",
"2018-11-01 00:01:23", "2018-11-01 00:01:25","2018-11-01 00:00:55"))
names(df) <- c("ID","Signature", "date")
df$date <- as.POSIXct(df$date)
library(data.table)
dt <- setDT(df)
dt[, time_idx := paste0(year(date), "-", yday(date), "-", hour(date))]
dt[, Count_Signature := (1L:.N) - 1L, keyby = .(Signature, time_idx)]
dt
我期望的结果是这样的:
ID Signature date time_idx Count_Signature
1 Attribute1 2018-11-01 00:00:19 2018-305-0 0
2 Attribute1 2018-11-01 00:00:54 2018-305-0 1
3 Attribute1 2018-11-01 00:01:17 2018-305-0 2
6 Attribute1 2018-11-01 00:00:55 2018-305-0 3
4 Attribute2 2018-11-01 00:01:23 2018-305-0 0
5 Attribute2 2018-11-01 00:01:25 2018-305-0 1
这是我过去一个小时需要的例子,但我也需要过去半小时和过去4个小时。
谢谢