我使用SQL在RDBMS中获取数据,并希望使用R来预测每日购买。
我想要的是存储数据帧,如下图所示,最后我将尝试编写函数来按行使用指数平滑的每个项目标题进行预测。
到目前为止,我已经成功完成了标题专栏。但我不能像上面的第二张图像那样制作多个日期列。这是迄今为止的代码:
df1 <- data.frame()
dailydate <- as.Date(as.POSIXct(data$date_placed))
newdate <- unique(dailydate)
itemtitle <- as.character(data$title)
newitemtitle <- unique(itemtitle)
df1 <- data.frame(newitemtitle,t(dailydate))
Error in data.frame(newitemtitle, t(dailydate))
我无法在df1
中添加新列,也找不到根据标题匹配每日数量的方法。我对这个问题的任何建议持开放态度
答案 0 :(得分:2)
这是使用reshape2
包的好地方。
df1 <- structure(list(title = structure(c(5L, 3L, 6L, 1L, 7L, 2L, 1L,
4L, 8L, 3L), .Label = c("d", "k", "m", "n", "q", "t", "u", "v"
), class = "factor"), quantity = c(4L, 3L, 5L, 10L, 6L, 13L,
4L, 6L, 12L, 1L), date_placed = structure(c(1L, 1L, 1L, 2L, 2L,
3L, 3L, 4L, 5L, 5L), .Label = c("8/24/2013", "8/25/2013", "8/26/2013",
"8/27/2013", "8/28/2013"), class = "factor")), .Names = c("title",
"quantity", "date_placed"), row.names = c(NA, -10L), class = "data.frame")
#install.packages("reshape2")
reshape2:::dcast(df1, title ~ date_placed, value.var = "quantity", fill = 0)
结果:
# title 8/24/2013 8/25/2013 8/26/2013 8/27/2013 8/28/2013
#1 d 0 10 4 0 0
#2 k 0 0 13 0 0
#3 m 3 0 0 0 1
#4 n 0 0 0 6 0
#5 q 4 0 0 0 0
#6 t 5 0 0 0 0
#7 u 0 6 0 0 0
#8 v 0 0 0 0 12
这样做的好处在于另一个答案是输出是一个data.frame,现在可以根据需要进行操作,而不是表格。
答案 1 :(得分:2)
另一个选项是来自spread
tidyr
library(tidyr)
spread(df1, date_placed, quantity, fill = 0)
答案 2 :(得分:1)
使用它来转换数据
xtabs(data = df1,quantity~title+date_placed)
数据强>
df1 <- structure(list(title = structure(c(5L, 3L, 6L, 1L, 7L, 2L, 1L,
4L, 8L, 3L), .Label = c("d", "k", "m", "n", "q", "t", "u", "v"
), class = "factor"), quantity = c(4L, 3L, 5L, 10L, 6L, 13L,
4L, 6L, 12L, 1L), date_placed = structure(c(1L, 1L, 1L, 2L, 2L,
3L, 3L, 4L, 5L, 5L), .Label = c("8/24/2013", "8/25/2013", "8/26/2013",
"8/27/2013", "8/28/2013"), class = "factor")), .Names = c("title",
"quantity", "date_placed"), row.names = c(NA, -10L), class = "data.frame")