我有一个数据框" samp"有一个列(让我们称之为"评级"),它具有多个值(让我们说下面的一个:" good","中等","坏"。)
我想在其他几个专栏中进行分组,并计算"好","中"的频率。并且"坏"并在新列中报告这些频率。 (所以也许col1是电影年,col2是流派,然后应该再增加三列,告诉你每年和每种类型的评级有多少。)
ddply(samp,c("col1","col2"), summarize,
good=table(samp$rating)["good"],
medium=table(samp$rating)["medium"],
bad=table(samp$rating)["bad"])
问题是(我认为)我定义的函数不是ddply输出的组,它们只是samp的常量函数。如何在这里定义功能以便它们能够成为群组的功能?
我尝试使用匿名函数:
ddply(samp,c("col1","col2"), summarize,
good=function(df)table(df$rating)["good"],
medium=function(df)table(df$rating)["medium"],
bad=function(df)table(df$rating)["bad"])
我永远无法让它工作。我认为我从中获得的最大错误是
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from closure to logical) in subassignment type fix
所以把它放在我身上。当我在尝试使用ddply和table的948506组合时犯下错误时,没有出现的荒谬简单的解决方案是什么?谢谢。
答案 0 :(得分:2)
只需移除samp$
内ddply
内的所有实例即可:
ddply(samp,c("col1","col2"), summarize,
good=table(rating)["good"],
medium=table(rating)["medium"],
bad=table(rating)["bad"])
答案 1 :(得分:1)
通用数据:
samp <- data.frame(rating=c("bad","medium","good","bad","medium","good"),
col1=c(2007,2010,2007,2009,2010,2010),
col2=c("fiction","fiction","fiction","drama","drama","drama"))
代码(您不应在列名称之前使用samp$
):
ddply(samp,c("col1","col2"), summarize,
good=sum(rating == "good"),
medium=sum(rating == "medium"),
bad=sum(rating == "bad"))
输出:
col1 col2 good medium bad
1 2007 fiction 1 0 1
2 2009 drama 0 0 1
3 2010 drama 1 1 0
4 2010 fiction 0 1 0