我的数据集如下所示:
customerid product store Date Sales
1 A X1 1/2/2013 4
1 B x2 1/9/2013 4
1 A x2 1/9/2013 4
1 C x1 1/16/2013 2
1 B x1 1/16/2013 2
1 A x1 1/16/2013 4
2 A x1 1/23/2013 2
2 B x2 1/30/2013 2
2 C x1 2/6/2013 2
2 D x3 2/13/2013 4
我需要按产品转置,以便所有产品都显示为列...如下所示..
customerid Date Store A B C D
1 1/2/2013 x1 4
1 1/9/2013 x2 4 4
1 1/16/2013 x1 4 2 2
2 1/23/2013 x1 2
2 1/30/2013 x2 2
2 2/6/2013 x1 2
2 2/13/2013 x3 4
请帮忙!我正在尝试使用转置函数,我试图通过这里的一些线程无意识地阅读,但无济于事
谢谢!
答案 0 :(得分:1)
您可以使用“reshape2”中的dcast
。
library(reshape2)
dcast(mydf, customerid + store + Date ~ product, value.var="Sales")
# customerid store Date A B C D
# 1 1 x1 1/16/2013 4 2 2 NA
# 2 1 X1 1/2/2013 4 NA NA NA
# 3 1 x2 1/9/2013 4 4 NA NA
# 4 2 x1 1/23/2013 2 NA NA NA
# 5 2 x1 2/6/2013 NA NA 2 NA
# 6 2 x2 1/30/2013 NA 2 NA NA
# 7 2 x3 2/13/2013 NA NA NA 4
如果您想使用“”代替NA
,您也可以这样做,但请注意您将这些列强制转换为character
。
dcast(mydf, customerid + store + Date ~ product, value.var="Sales", fill="")
# customerid store Date A B C D
# 1 1 x1 1/16/2013 4 2 2
# 2 1 X1 1/2/2013 4
# 3 1 x2 1/9/2013 4 4
# 4 2 x1 1/23/2013 2
# 5 2 x1 2/6/2013 2
# 6 2 x2 1/30/2013 2
# 7 2 x3 2/13/2013 4
对于基本R解决方案,您可以使用reshape()
:
reshape(mydf, direction = "wide",
idvar = c("customerid", "store", "Date"),
timevar = "product")
# customerid store Date Sales.A Sales.B Sales.C Sales.D
# 1 1 X1 1/2/2013 4 NA NA NA
# 2 1 x2 1/9/2013 4 4 NA NA
# 4 1 x1 1/16/2013 4 2 2 NA
# 7 2 x1 1/23/2013 2 NA NA NA
# 8 2 x2 1/30/2013 NA 2 NA NA
# 9 2 x1 2/6/2013 NA NA 2 NA
# 10 2 x3 2/13/2013 NA NA NA 4
另一种可能性是使用model.matrix
(感谢@Thomas在最近的Q& A中解释model.matrix
方法:
# cbind(mydf, model.matrix(~ 0 + product, data = mydf) * mydf$Sales)
# customerid product store Date Sales productA productB productC productD
# 1 1 A X1 1/2/2013 4 4 0 0 0
# 2 1 B x2 1/9/2013 4 0 4 0 0
# 3 1 A x2 1/9/2013 4 4 0 0 0
# 4 1 C x1 1/16/2013 2 0 0 2 0
# 5 1 B x1 1/16/2013 2 0 2 0 0
# 6 1 A x1 1/16/2013 4 4 0 0 0
# 7 2 A x1 1/23/2013 2 2 0 0 0
# 8 2 B x2 1/30/2013 2 0 2 0 0
# 9 2 C x1 2/6/2013 2 0 0 2 0
# 10 2 D x3 2/13/2013 4 0 0 0 4