我有一个鸟类观察记录的数据集,大约有30万行,有7列。我想根据其他3个列的唯一组合创建一个新列,所有这些列都是因子变量-“ gridref”,即记录所在的1km网格正方形; “观察者”,进行观察的人和“日期”,观察的日期。我想为每一个唯一的“访问”创建一个新列“ visit_ID”,以访问1公里的网格正方形-也就是说,每个唯一的gridref,观察者和日期组合。
我尝试使用以下代码:
birds_raw$vid <- as.integer(interaction(birds_raw$gridref, birds_raw$observer, birds_raw$date))
这将返回以下错误消息:
Error: cannot allocate vector of size 636.1 Gb
In addition: Warning message:
In ans * length(l) : NAs produced by integer overflow
我确信必须有一种简单的方法来实现这一目标。有人可以帮忙吗?
答案 0 :(得分:0)
您可以使用data.table
有效地做到这一点:
library(data.table)
birds_raw <-
data.table(
other_var = factor(c("other 1", "other 2", "other 3", "other 4")),
gridref = factor(c("grid 1", "grid 2", "grid 1", "grid 1")),
observer = factor(c("person 1", "person 2", "person 2", "person 1")),
date = factor(c("date 1", "date 2", "date 1", "date 1"))
)
birds_raw[, visit_id := .GRP, by = c("gridref", "observer", "date")][]