如果不清楚,请道歉。假设我有一个数据帧:
ID TIME AMOUNTSPENT
01 12:34 50
01 14:37 100
02 12:40 25
03 10:10 50
01 14:35 25
我想生成很多功能。具体根据TIME和每个唯一ID的平均每小时等方面。这通常会在当天每小时生成24列。因此,结果数据框将是这样的:
ID HOUR12MEANSPEND HOUR13MEANSPEND HOUR14MEANSPEND
01 37.5 0 100
我理解这是一个复杂的问题需要解释,甚至一些关于如何开始这个问题的技巧也会得到大力帮助!
答案 0 :(得分:2)
dplyr
和reshape2
的一种方式:
library(dplyr)
library(reshape2)
df %>%
#grouping - only by the hour
group_by(ID, TIME = substr(TIME, 1, 2)) %>%
#summarise
summarise(averagespend = mean(AMOUNTSPENT)) %>%
#cast time in columns
dcast(ID ~ TIME, value.var = 'averagespend')
输出:
ID 10 12 14
1 1 NA 50 62.5
2 2 NA 25 NA
3 3 50 NA NA
数据:
structure(list(ID = c(1L, 1L, 2L, 3L, 1L), TIME = structure(c(2L,
5L, 3L, 1L, 4L), .Label = c("10:10", "12:34", "12:40", "14:35",
"14:37"), class = "factor"), AMOUNTSPENT = c(50L, 100L, 25L,
50L, 25L)), .Names = c("ID", "TIME", "AMOUNTSPENT"), class = "data.frame", row.names = c(NA,
-5L))