在R中的数据帧/表上使用by时出错

时间:2016-05-30 10:53:49

标签: r

首先道歉,因为我无法提供完全可重复的例子,但请耐心等待。

我有一个数据表/数据框(根据'class')看起来像这样(称为variable.nuts1_MALE.counts):

   country region   N freq.1 result level delete_to.few.observations
1:      DE    DE2 187     15   8.41     1                          1
2:      DE    DE1 142      9   7.30     1                          1
3:      DE    DEA 231     19   8.75     1                          1
4:      DE    DED 136      5   5.32     1                          1
5:      DE    DE9 114     13  11.40     1                          1
6:      UK    UKJ 147     14   6.35     1                          1
7:      UK    UKD 108     12   7.36     1                          1

然后我希望运行以下代码行,根据每个国家/地区有多少个区域添加一个额外的列(为5个或更多(即DE)添加1,为少于5添加0(即英国) ):

setDT(variable.nuts1_MALE.counts)[, delete_too.few.regions:= if(.N < 5) "0" else "1", by = unlist(country)]
variable.nuts1_MALE.regions <- subset(variable.nuts1_MALE.counts, delete_too.few.regions == 1)

这一直在处理我一直在运行的所有其他数据,但这次我收到错误消息:

Error in `[.data.table`(setDT(variable.nuts1_MALE.counts), , `:=`(delete_too.few.regions,  : 

  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(unlist(country)) should work. This is for efficiency so data.table can detect which columns are needed.

任何人都可以告诉我们出了什么问题吗?

当我尝试建议时(可能很糟糕),我收到错误消息:

setDT(variable.nuts1_MALE.counts)[, delete_too.few.regions:= if(.N < 5) "0" else "1", by=eval(unlist(country))]
Error in unlist(country) : object 'country' not found

setDT(variable.nuts1_MALE.counts)[, delete_too.few.regions:= if(.N < 5) "0" else "1", by = list(country)]

Error in `[.data.table`(setDT(variable.nuts1_MALE.counts), , `:=`(delete_too.few.regions,  : 
  column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

当我输入表时,我似乎无法重现错误,但如果有人有任何替代建议,这里是数据。

variable.nuts1_MALE.counts <- structure(list(country = list("DE", "DE", "DE", "DE", "DE", "UK", 
                              "UK"), region = c("DE2", "DE1", "DEA", "DED", "DE9", "UKJ", 
                                                "UKD"), N = c(187L, 142L, 231L, 136L, 114L, 147L, 108L), freq.1 = c(15L, 
                                                                                                                    9L, 19L, 5L, 13L, 14L, 12L), result = c(8.41, 7.3, 8.75, 5.32, 
                                                                                                                                                            11.4, 6.35, 7.36), level = c(1, 1, 1, 1, 1, 1, 1), delete_to.few.observations = c(1L, 
                                                                                                                                                                                                                                              1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("country", "region", "N", 
                                                                                                                                                                                                                                                                                   "freq.1", "result", "level", "delete_to.few.observations"), class = c("data.table", 
                                                                                                                                                                                                                                                                                                                                                         "data.frame"), row.names = c(NA, -7L))

0 个答案:

没有答案