如何更有效地将我的数据帧重塑为新形式(R)?

时间:2016-08-01 22:55:12

标签: r dataframe dplyr reshape2 melt

我有一个这样的数据集(df1

ID  2   4   6   8   10  12  14  16  18  20  22  24   Day
1   0   0   0   0   2   0   0   0   1   0   1   0    Sunday
1   0   0   0   0   0   4   0   0   0   0   0   0   Monday
1   0   0   0   0   0   0   0   0   2   0   0   0   Tuesday
1   0   0   0   0   0   0   2   0   0   0   0   0   Wednesday
1   0   0   0   0   0   0   0   2   0   0   0   0   Thursday
1   0   0   0   0   0   0   0   0   2   0   0   0   Friday
1   0   0   0   0   0   0   0   0   0   2   0   0   Saturday
2   0   0   0   0   0   0   0   0   0   0   0   0   Sunday
2   0   0   0   0   0   1   0   0   0   0   0   0   Monday
2   0   0   0   0   0   0   1   0   0   0   1   0   Tuesday
2   0   0   0   0   0   0   0   1   0   0   0   0   Wednesday
2   0   0   0   0   0   0   0   0   1   0   0   0   Thursday
2   0   0   0   0   0   2   0   0   0   1   0   0   Friday
2   0   0   0   0   0   0   0   0   0   0   0   0   Saturday
3   0   0   0   0   0   0   0   0   0   0   0   0   Sunday
3   0   0   0   0   0   0   2   0   0   0   0   0   Monday
3   0   0   0   0   0   1   0   0   2   0   0   0   Tuesday
3   0   0   0   0   0   0   0   0   0   0   0   0   Wednesday
3   0   0   0   0   0   0   0   2   0   0   0   0   Thursday
3   0   0   0   0   0   0   0   0   0   0   0   0   Friday
3   0   0   0   0   0   0   2   0   0   0   0   0   Saturday
3   0   0   0   0   0   0   0   2   0   0   0   0   Sunday

我有一个ID清单,如下所示:

ID
1
2
3

我想将df1转换为这种输出:

ID  Var1    Var2    Var3    Var4    Var5 ...... Var82   Var83 Var84
1   0         0      0         0     2             2      0     0
2
3

其中Var1表示'星期日2'(在第一个数据帧中),var84表示'Saturday24'。我想将结果导出为.csv文件。

我这样做是通过使用for循环(如下所示),因为ID太多了。但问题是这些代码运行速度非常慢。有没有更快的方法来获得相同的结果?

library(dplyr)
library(reshape2)
for (i in ID_checklist$ID) {

  x= filter(df1$ID %in% i)
  x$Day = NULL
  df.melted = melt(t(x[,-1]), id.vars = NULL)
  myNewDF = data.frame(i, t(df.melted[,3]))
  write.table(myNewDF,file="my12x7.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
}

1 个答案:

答案 0 :(得分:0)

我认为这就是你想要的:

library(reshape2)

# this may be unnecessary depending on your data
# it will make sure the weekday columns come in the same order
# as the weekdays appear in your original data
df1$Day = factor(df1$Day, levels = unique(df1$Day))

# convert to a fully long format
df_long = melt(df1, id.var = c("ID", "Day"))

# convert to the wide format you want
result = dcast(data = df_long, ID ~ Day + variable, fun.aggregate = sum)

这将使用当前变量追加日期名称。如果您希望将它们设为Var1 Var2 Var3,请使用paste()并重命名列。

我们可以查看前几列来验证:

result[, 1:6]
#   ID Sunday_X2 Sunday_X4 Sunday_X6 Sunday_X8 Sunday_X10
# 1  1         0         0         0         0          2
# 2  2         0         0         0         0          0
# 3  3         0         0         0         0          0