我想总结一下我的数据的通过/失败状态,如下所示。换句话说,我想告诉每种产品/类型的通过和失败案例的数量。
library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)
以下cmd返回传递+失败案例的总数,但我想要单独的列传递和失败
dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))
结果是:
product type N
1 p1 t1 6
2 p1 t2 6
3 p2 t1 6
4 p2 t2 6
理想的结果将是
product type Pass Fail
1 p1 t1 5 1
2 p1 t2 3 3
3 p2 t1 4 2
4 p2 t2 3 3
我尝试过这样的事情:
dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )
但显然这是错误的,因为结果是失败和传递的重要结果。
提前感谢您的建议! 问候, 里亚德。
答案 0 :(得分:11)
尝试:
dfSummary <- ddply(df, c("product", "type"), summarise,
Pass=sum(result=="pass"), Fail=sum(result=="fail") )
这给了我结果:
product type Pass Fail
1 p1 t1 5 1
2 p1 t2 3 3
3 p2 t1 4 2
4 p2 t2 3 3
说明:
df
提供给ddply
函数。ddply
分裂变量“product”和“type”
length(unique(product)) * length(unique(type))
个片段(即数据df
的子集)在两个变量的每个组合上分开。ddply
应用您提供的某些功能。在这种情况下,您需要计算result=="pass"
和result=="fail"
的数量。ddply
留下了一些结果,即您分割的变量(产品和类型)以及您请求的结果(通过和失败)。答案 1 :(得分:4)
您也可以使用reshape2::dcast
。
library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
## product type fail pass
## 1 p1 t1 1 5
## 2 p1 t2 3 3
## 3 p2 t1 2 4
## 4 p2 t2 3 3