R:重塑数据帧并创建比例

时间:2015-03-19 22:40:38

标签: r reshape

我是编程新手,非常感谢您提供帮助。 我有一个数据框,其中包含产品名称和销售日期。对于每种产品,我都需要查看星期一,星期二,星期三等售出的分数。

请按照此操作复制我的数据框:

Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")
df=data.frame(cbind(Product,Day))

我尝试了以下内容:

data.frame(prop.table(with(df,table(Product,Day))))

df.wide=reshape(data.frame(prop.table(with(df,table(Product,Day)))),
                  timevar="Day",
                  idvar="Product",
                  direction="wide")

给了我

Product Freq.Friday Freq.Monday Freq.Saturday Freq.Sunday Freq.Thursday Freq.Tuesday Freq.Wednesday
       A   0.1111111  0.11111111    0.11111111  0.00000000     0.1111111   0.11111111     0.11111111
       B   0.0000000  0.05555556    0.00000000  0.00000000     0.0000000   0.05555556     0.05555556
       C   0.0000000  0.05555556    0.05555556  0.05555556     0.0000000   0.00000000     0.00000000

我可以总结第2-8列并获得A,B,C的总比例但是如何获得按星期几销售的A,B,C的比例?

谢谢!

2 个答案:

答案 0 :(得分:2)

table结合使用时,这是一项非常简单的prop.table(...,margin=)操作。 margin=参数允许计算行,列或整个表的比例(默认)prop.table(...,1)执行行; 2列,3确实是等等。

此外,使用data.frame代替as.data.frame.matrix,以避免重塑要求:

as.data.frame.matrix(prop.table(with(df,table(Product,Day)),1))
#     Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday
#A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000

as.data.frame.matrix(prop.table(with(df,table(Product,Day)),2))
#  Friday Monday  Saturday Sunday Thursday   Tuesday Wednesday
#A      1   0.50 0.6666667      0        1 0.6666667 0.6666667
#B      0   0.25 0.0000000      0        0 0.3333333 0.3333333
#C      0   0.25 0.3333333      1        0 0.0000000 0.0000000

您可能还需要考虑在适当的星期日至星期六的订单中使用day factor levels {/ 1}}。

答案 1 :(得分:1)

试试这个:

library(reshape2)
library(plyr)

ddply(dcast(df, Product ~ Day),1,function(u) data.frame(u[1], u[-1]/sum(u[-1])))

#  Product    Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday
#1       A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#2       B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#3       C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000