我希望在我的数据的子数据库上运行ddply,但下面的示例只返回TRUE或FALSE
ddply(demoData, .(name, id, gender == "Male"), summarize, tot = sum(count))
和
ddply(demoData[demoData$gender == 'Male'], .(name, id, gender), summarize, tot = sum(count))
似乎也不起作用。 最终我需要按名称和身份对所有性别=“男性”的实例进行“计数”。
请求的数据样本
id name gender age count
1 apple Male 13-20 25
1 apple Male 21-40 30
1 apple Female 13-20 60
1 apple Female 21-40 42
2 banana Male 13-20 45
2 banana Male 21-40 12
2 banana Female 13-20 22
2 banana Female 21-40 74
我想要归来的是
1 apple Male 55
2 banana Male 57
答案 0 :(得分:3)
Base R aggregate
可以非常简单地执行此操作:
aggregate(
count ~ id + name + gender,
FUN=sum,
subset=gender=="Male",
data=demoData
)
结果:
id name gender count
1 1 apple Male 55
2 2 banana Male 57
如果你绝对必须使用plyr
,因为你的生活取决于它或其他原因,那么:
ddply(
demoData[demoData$gender=="Male",],
.(id, name, gender),
summarise,
sumcount=sum(count)
)
,并提供:
id name gender sumcount
1 1 apple Male 55
2 2 banana Male 57
答案 1 :(得分:1)
即使ddply
没有内置subset
参数,
ddply(subset(demoData, gender=="Male"),
.(name, id), summarize, tot = sum(count))
似乎工作得很好......
name id tot
1 apple 1 55
2 banana 2 57
...虽然结果中没有Male
。为此你需要
ddply(subset(demoData, gender=="Male"),
.(name, id, gender), summarize, tot = sum(count))