我有一个这样的数据集(df1
)
ID 2 4 6 8 10 12 14 16 18 20 22 24 Day
1 0 0 0 0 2 0 0 0 1 0 1 0 Sunday
1 0 0 0 0 0 4 0 0 0 0 0 0 Monday
1 0 0 0 0 0 0 0 0 2 0 0 0 Tuesday
1 0 0 0 0 0 0 2 0 0 0 0 0 Wednesday
1 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
1 0 0 0 0 0 0 0 0 2 0 0 0 Friday
1 0 0 0 0 0 0 0 0 0 2 0 0 Saturday
2 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
2 0 0 0 0 0 1 0 0 0 0 0 0 Monday
2 0 0 0 0 0 0 1 0 0 0 1 0 Tuesday
2 0 0 0 0 0 0 0 1 0 0 0 0 Wednesday
2 0 0 0 0 0 0 0 0 1 0 0 0 Thursday
2 0 0 0 0 0 2 0 0 0 1 0 0 Friday
2 0 0 0 0 0 0 0 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
3 0 0 0 0 0 0 2 0 0 0 0 0 Monday
3 0 0 0 0 0 1 0 0 2 0 0 0 Tuesday
3 0 0 0 0 0 0 0 0 0 0 0 0 Wednesday
3 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
3 0 0 0 0 0 0 0 0 0 0 0 0 Friday
3 0 0 0 0 0 0 2 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 2 0 0 0 0 Sunday
我有一个ID
清单,如下所示:
ID
1
2
3
我想将df1
转换为这种输出:
ID Var1 Var2 Var3 Var4 Var5 ...... Var82 Var83 Var84
1 0 0 0 0 2 2 0 0
2
3
其中Var1
表示'星期日2'(在第一个数据帧中),var84表示'Saturday24'。我想将结果导出为.csv
文件。
我这样做是通过使用for循环(如下所示),因为ID太多了。但问题是这些代码运行速度非常慢。有没有更快的方法来获得相同的结果?
library(dplyr)
library(reshape2)
for (i in ID_checklist$ID) {
x= filter(df1$ID %in% i)
x$Day = NULL
df.melted = melt(t(x[,-1]), id.vars = NULL)
myNewDF = data.frame(i, t(df.melted[,3]))
write.table(myNewDF,file="my12x7.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
}
答案 0 :(得分:0)
我认为这就是你想要的:
library(reshape2)
# this may be unnecessary depending on your data
# it will make sure the weekday columns come in the same order
# as the weekdays appear in your original data
df1$Day = factor(df1$Day, levels = unique(df1$Day))
# convert to a fully long format
df_long = melt(df1, id.var = c("ID", "Day"))
# convert to the wide format you want
result = dcast(data = df_long, ID ~ Day + variable, fun.aggregate = sum)
这将使用当前变量追加日期名称。如果您希望将它们设为Var1 Var2 Var3
,请使用paste()
并重命名列。
我们可以查看前几列来验证:
result[, 1:6]
# ID Sunday_X2 Sunday_X4 Sunday_X6 Sunday_X8 Sunday_X10
# 1 1 0 0 0 0 2
# 2 2 0 0 0 0 0
# 3 3 0 0 0 0 0