分组因子,数据框和tapply的问题

时间:2013-04-22 10:56:52

标签: r statistics dataframe tapply

我对R和统计数据一般都是新手,并且很难让tapply()工作。我有一个包含15列和数千行的数据框。我使用类似y1<-((x>0)&(x<=5))之类的东西制作了一堆逻辑向量,其中x是数据框中的列名。然后将这些逻辑向量组合并使用factor()转换为分组因子。一切看起来都很好用。

问题在于,当我尝试将{tapply()与tapply(dataframe, group, sample, size=20)一起使用,其中group是分组因素时,我得到错误:'参数必须具有相同的长度'。当我尝试length(dataframe)时,我得到数据框中的列数(仅15),而length(group)返回行数(数千)。我正在创建逻辑向量和分组因子的方式是否有错误?

这是来自dput()的输出,正如Maxim.K建议的那样:(抱歉,它不是很整洁)

 structure(list(Lat = c(-90L, -90L, -90L, -90L, -90L, -90L, -90L, 
-90L, -90L, -90L, -90L, -90L, -90L, -90L, -90L), Lon = -180:-166, 
    Jan = c(2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 
    2.79, 2.79, 2.79, 2.79, 2.79, 2.79), Feb = c(2.35, 2.35, 
    2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 
    2.35, 2.35, 2.35), Mar = c(0.49, 0.49, 0.49, 0.49, 0.49, 
    0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49
    ), Apr = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    May = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jun = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jul = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Aug = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Sep = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Oct = c(1.75, 1.75, 1.75, 
    1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 
    1.75, 1.75), Nov = c(2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 
    2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77), Dec = c(2.65, 
    2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 
    2.65, 2.65, 2.65, 2.65), Ann = c(1.07, 1.07, 1.07, 1.07, 
    1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 
    1.07)), .Names = c("Lat", "Lon", "Jan", "Feb", "Mar", "Apr", 
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Ann"
), row.names = c(NA, 15L), class = "data.frame")

对于小组:

头部的15个值(来自dput())

  structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
    8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")

......并从尾巴

structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")

我正在尝试使用tapply()(大小为20)从所有8个类别中随机抽取样本。

[编辑]完全不足为奇,问题不在于问题和要求,而在于我的理解。我误解了这个问题;实际上,我只应该从一列中采样,而不是从整个数据帧中采样。

1 个答案:

答案 0 :(得分:4)

您可以在此处使用

tapply,只需将group向量添加到data.frame,然后使用tapply,如下所示:

# Generating a 'group' vector with variability in its values 
# and merging it to the existing data.frame (FOO)
set.seed(1)
FOO$group <- as.factor(sample( 1:8, nrow(FOO), replace=TRUE)) 

# Using tapply
tapply(FOO[,-16], FOO[,16], sample, size=20, replace=TRUE)

这可能是你作业的答案。