来自“sampling”的strata()返回错误:参数意味着不同的行数

时间:2013-02-06 17:41:48

标签: r

我有一个如下所示的数据框:

'data.frame':   1090 obs. of  8 variables:
 $ id            : chr  "INC000000209241" "INC000000218488" "INC000000218982" "INC000000225646" ...
 $ service.type  : chr  "Incident" "Incident" "Incident" "Incident" ...
 $ priority      : chr  "Critical" "Critical" "Critical" "Critical" ...

我按如下方式订购数据:

data <- data[order(data$priority),]

我一直在改变因素等优先级,但无论我尝试什么,当我尝试运行以下内容时:

s = strata(data,c("priority"),size=c(0,0,1,5))

我总是收到以下错误:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 1

我尝试调试该函数以查看是否可以告诉为什么会出现此错误(但我无法理解代码)。在执行strata()函数的这个阶段引发了错误:

debug: r = cbind(r, i)

非常感谢你的帮助!

1 个答案:

答案 0 :(得分:5)

问题在于您尝试将某些组的样本大小设置为零。相反,在采样之前对原始数据进行子集化。

在这里,我们重现您的问题。

library(sampling)
data(swissmunicipalities)
length(table(swissmunicipalities$REG)) # We have seven strata
# [1] 7

# Let's take two from each group
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = rep(2, 7), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 93     4      93 0.011695906       1
# 145    4     145 0.011695906       1
# 2574   1    2574 0.003395586       2
# 2631   1    2631 0.003395586       2
# 826    3     826 0.006230530       3
# 1614   3    1614 0.006230530       3
# 583    2     583 0.002190581       4
# 1017   2    1017 0.002190581       4
# 1297   5    1297 0.004246285       5
# 2535   5    2535 0.004246285       5
# 342    6     342 0.010752688       6
# 347    6     347 0.010752688       6
# 651    7     651 0.008163265       7
# 2471   7    2471 0.008163265       7

# Let's try to drop the first two groups. Oops...
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = c(0, 0, 2, 2, 2, 2, 2), 
       method="srswor")
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 0, 1

让我们的子集再试一次。

swiss2 <- swissmunicipalities[!swissmunicipalities$REG %in% c(1, 2), ]
table(swiss2$REG)
strata(swiss2, 
       stratanames = c("REG"), 
       size = c(2, 2, 2, 2, 2), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 58     4      58 0.011695906       1
# 115    4     115 0.011695906       1
# 432    3     432 0.006230530       2
# 986    3     986 0.006230530       2
# 1007   5    1007 0.004246285       3
# 1150   5    1150 0.004246285       3
# 190    6     190 0.010752688       4
# 497    6     497 0.010752688       4
# 1049   7    1049 0.008163265       5
# 1327   7    1327 0.008163265       5