获取R中的数据计数

时间:2016-04-19 21:40:40

标签: r

我第一次使用R.我有以下数据集(我实际使用的非常大的数据集的模型):

Type       Date         Size       Color
L shape    2008-04-14   161    blue    
L shape    2010-10-16   654    yellow
L shape    2005-07-03   149    blue
L shape    2006-08-16   657    yellow
L shape    2007-04-08   229    yellow
L shape    2004-03-17   784    green
Y shape    2014-02-22   917    pink
Y shape    2012-05-04   186    green
Y shape    2006-11-25   641    yellow
Y shape    2015-09-07   493    blue
Y shape 2011-07-06  953 green

我想找回每种类型的每种颜色的出现次数,每种类型的日期以及每种类型的尺寸的最小值,最大值和平均值。输出应如下所示:

Type       Colors   Dates           Mean Size   Min Size    Max Size
L shape      3          2008-04-14  439         149         784
                2010-10-16          
                2005-07-03          
                2006-08-16          
                2007-04-08          
                2004-03-17          

Y shape     4           2014-02-22  638         186         953
                2012-05-04          
                2006-11-25          
                2015-09-07          
                2011-07-06          

这是我到目前为止所做的:

cat <- big_catalog

na.rm = TRUE

library(plyr)

mydata <-ddply(cat, c("Type", "Date", "Size", "Color"), summarize,
               Colors = length(Color),
               Dates = (Date),
               Mean_Size = mean(Size),
               Minimum_Size = min(Size),
               Maximum_Size = max(Size)
)

但我最终得到了这个:

Type    Date    Size    Color   Colors  Dates   Mean Size   Min Size    Max Size
L shape 2008-04-14  161 blue    2   2008-04-14  161 161 161
L shape 2010-10-16  654 yellow  3   2010-10-16  654 654 654
L shape 2005-07-03  149 blue    2   2005-07-03  149 149 149
L shape 2006-08-16  657 yellow  3   2006-08-16  657 657 657
L shape 2007-04-08  229 yellow  2   2007-04-08  229 229 229
L shape 2004-03-17  784 green   1   2004-03-17  784 784 784
Y shape 2014-02-22  917 pink    1   2014-02-22  917 917 917
Y shape 2012-05-04  186 green   2   2012-05-04  186 186 186
Y shape 2006-11-25  641 yellow  1   2006-11-25  641 641 641
Y shape 2015-09-07  493 blue    1   2015-09-07  493 493 493
Y shape 2011-07-06  953 green   2   2011-07-06  953 953 953

我显然需要循环这个,但我对R很新,我不知道该怎么做。

1 个答案:

答案 0 :(得分:0)

像......那样......

df <- read.table(text=
"Type       Date         Size       Color
Lshape    2008-04-14   161    blue    
Lshape    2010-10-16   654    yellow
Lshape    2005-07-03   149    blue
Lshape    2006-08-16   657    yellow
Lshape    2007-04-08   229    yellow
Lshape    2004-03-17   784    green
Yshape    2014-02-22   917    pink
Yshape    2012-05-04   186    green
Yshape    2006-11-25   641    yellow
Yshape    2015-09-07   493    blue
Yshape 2011-07-06  953 green", header=TRUE)

by(df, df$Type, function(x){
  data.frame(Colors = length(unique(x$Color)),
             Dates = paste(x$Date, collapse=";"),
             Mean.size = mean(x$Size),
             Min.size = min(x$Size),
             Max.size = max(x$Size))
})