对R中的数据帧的列进行计数

时间:2014-01-10 20:18:25

标签: r dataframe plyr

我有一个数据框" samp"有一个列(让我们称之为"评级"),它具有多个值(让我们说下面的一个:" good","中等","坏"。)

我想在其他几个专栏中进行分组,并计算"好","中"的频率。并且"坏"并在新列中报告这些频率。 (所以也许col1是电影年,col2是流派,然后应该再增加三列,告诉你每年和每种类型的评级有多少。)

 ddply(samp,c("col1","col2"), summarize, 
       good=table(samp$rating)["good"],
       medium=table(samp$rating)["medium"],
       bad=table(samp$rating)["bad"])

问题是(我认为)我定义的函数不是ddply输出的组,它们只是samp的常量函数。如何在这里定义功能以便它们能够成为群组的功能?

我尝试使用匿名函数:

 ddply(samp,c("col1","col2"), summarize, 
       good=function(df)table(df$rating)["good"],
       medium=function(df)table(df$rating)["medium"],
       bad=function(df)table(df$rating)["bad"])

我永远无法让它工作。我认为我从中获得的最大错误是

 Error in output[[var]][rng] <- df[[var]] : 
 incompatible types (from closure to logical) in subassignment type fix

所以把它放在我身上。当我在尝试使用ddply和table的948506组合时犯下错误时,没有出现的荒谬简单的解决方案是什么?谢谢。

2 个答案:

答案 0 :(得分:2)

只需移除samp$ddply内的所有实例即可:

ddply(samp,c("col1","col2"), summarize, 
  good=table(rating)["good"],
  medium=table(rating)["medium"],
  bad=table(rating)["bad"])

答案 1 :(得分:1)

通用数据:

samp <- data.frame(rating=c("bad","medium","good","bad","medium","good"),
                   col1=c(2007,2010,2007,2009,2010,2010),
                   col2=c("fiction","fiction","fiction","drama","drama","drama"))

代码(您不应在列名称之前使用samp$):

ddply(samp,c("col1","col2"), summarize, 
      good=sum(rating == "good"),
      medium=sum(rating == "medium"),
      bad=sum(rating == "bad"))

输出:

  col1    col2 good medium bad
1 2007 fiction    1      0   1
2 2009   drama    0      0   1
3 2010   drama    1      1   0
4 2010 fiction    0      1   0