转置数据框中的特定列

时间:2013-09-02 10:09:42

标签: r reshape transpose

我的数据集如下所示:

customerid  product store   Date    Sales
1   A   X1  1/2/2013    4
1   B   x2  1/9/2013    4
1   A   x2  1/9/2013    4
1   C   x1  1/16/2013   2
1   B   x1  1/16/2013   2
1   A   x1  1/16/2013   4
2   A   x1  1/23/2013   2
2   B   x2  1/30/2013   2
2   C   x1  2/6/2013    2
2   D   x3  2/13/2013   4

我需要按产品转置,以便所有产品都显示为列...如下所示..

customerid  Date    Store   A   B   C   D
1   1/2/2013    x1  4           
1   1/9/2013    x2  4   4       
1   1/16/2013   x1  4   2   2   
2   1/23/2013   x1  2           
2   1/30/2013   x2      2       
2   2/6/2013    x1          2   
2   2/13/2013   x3              4

请帮忙!我正在尝试使用转置函数,我试图通过这里的一些线程无意识地阅读,但无济于事

谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用“reshape2”中的dcast

library(reshape2)
dcast(mydf, customerid + store + Date ~ product, value.var="Sales")
#   customerid store      Date  A  B  C  D
# 1          1    x1 1/16/2013  4  2  2 NA
# 2          1    X1  1/2/2013  4 NA NA NA
# 3          1    x2  1/9/2013  4  4 NA NA
# 4          2    x1 1/23/2013  2 NA NA NA
# 5          2    x1  2/6/2013 NA NA  2 NA
# 6          2    x2 1/30/2013 NA  2 NA NA
# 7          2    x3 2/13/2013 NA NA NA  4

如果您想使用“”代替NA,您也可以这样做,但请注意您将这些列强制转换为character

dcast(mydf, customerid + store + Date ~ product, value.var="Sales", fill="")
#   customerid store      Date A B C D
# 1          1    x1 1/16/2013 4 2 2  
# 2          1    X1  1/2/2013 4      
# 3          1    x2  1/9/2013 4 4    
# 4          2    x1 1/23/2013 2      
# 5          2    x1  2/6/2013     2  
# 6          2    x2 1/30/2013   2    
# 7          2    x3 2/13/2013       4

对于基本R解决方案,您可以使用reshape()

reshape(mydf, direction = "wide", 
        idvar = c("customerid", "store", "Date"), 
        timevar = "product")
#    customerid store      Date Sales.A Sales.B Sales.C Sales.D
# 1           1    X1  1/2/2013       4      NA      NA      NA
# 2           1    x2  1/9/2013       4       4      NA      NA
# 4           1    x1 1/16/2013       4       2       2      NA
# 7           2    x1 1/23/2013       2      NA      NA      NA
# 8           2    x2 1/30/2013      NA       2      NA      NA
# 9           2    x1  2/6/2013      NA      NA       2      NA
# 10          2    x3 2/13/2013      NA      NA      NA       4

另一种可能性是使用model.matrix(感谢@Thomas在最近的Q& A中解释model.matrix方法:

# cbind(mydf, model.matrix(~ 0 + product, data = mydf) * mydf$Sales)
#    customerid product store      Date Sales productA productB productC productD
# 1           1       A    X1  1/2/2013     4        4        0        0        0
# 2           1       B    x2  1/9/2013     4        0        4        0        0
# 3           1       A    x2  1/9/2013     4        4        0        0        0
# 4           1       C    x1 1/16/2013     2        0        0        2        0
# 5           1       B    x1 1/16/2013     2        0        2        0        0
# 6           1       A    x1 1/16/2013     4        4        0        0        0
# 7           2       A    x1 1/23/2013     2        2        0        0        0
# 8           2       B    x2 1/30/2013     2        0        2        0        0
# 9           2       C    x1  2/6/2013     2        0        0        2        0
# 10          2       D    x3 2/13/2013     4        0        0        0        4