合并具有不同长度的两个数据集会在R中引入错误

时间:2014-12-01 12:22:04

标签: r data.table

我正在尝试合并两个不同长度的datasets/datatable,但我不断收到以下错误

Error in `[.data.table`(y, xkey, nomatch = ifelse(all.x, NA, 0), allow.cartesian = allow.cartesian) 

: 
  retFirst must be integer vector the same length as nrow(i)

我无法理解错误消息的含义。有人可以帮忙吗? 我使用以下代码进行合并:

merge(x=Red,y=Error,by=c("loopN","TYPE"),all.x=TRUE)

datatable data:



DATA TABLE RED

    TIME        TYPE	loopN	diff
11/26/2014 0:45	28808	141126	0
11/26/2014 1:00	28808	141126	0
11/26/2014 1:15	28808	141126	0
11/26/2014 1:30	28808	141126	0
11/26/2014 1:15	189379	141126	0
11/26/2014 1:30	189379	141126	0
11/26/2014 2:15	189379	141126	0
11/26/2014 1:00	239188	141126	0
11/26/2014 1:15	239188	141126	0
11/26/2014 1:30	239188	141126	0
11/26/2014 13:30 239188	141126	0


DATA TABLE ERROR

loopN	TYPE	V1
141126	28808	-2.932
141126	28808	-2.932
141126	28808	-2.932
141126	28808	-2.932
141126	189379	1.061
141126	189379	-1.182
141126	189379	4.771
141126	239188	-0.163
141126	239188	-1.573
141126	239188	-1.981
141126	239188	-1.981




1 个答案:

答案 0 :(得分:0)

你提出的建议似乎很奇怪。您的data.table RED有4条TYPE=28808loopN=141126条记录。同样,您的data.table ERROR也有4条记录,其中包含TYPEloopN的组合。因此合并(加入)将产生 16 记录。如果您的整个1MM记录data.table就是这样,那么结果将是巨大的。如果这真的是你想要的,那就可以了。

RED   <- structure(list(TIME = c("11/26/2014 0:45", "11/26/2014 1:00", "11/26/2014 1:15", "11/26/2014 1:30", "11/26/2014 1:15", "11/26/2014 1:30", "11/26/2014 2:15", "11/26/2014 1:00", "11/26/2014 1:15", "11/26/2014 1:30", "11/26/2014 13:30"), TYPE = c(28808L, 28808L, 28808L, 28808L, 189379L, 189379L, 189379L, 239188L, 239188L, 239188L, 239188L), loopN = c(141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L), diff = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("TIME", "TYPE", "loopN", "diff"), class = "data.frame", row.names = c(NA, -11L))
ERROR <- structure(list(loopN = c(141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L, 141126L), TYPE = c(28808L, 28808L, 28808L, 28808L, 189379L, 189379L, 189379L, 239188L, 239188L, 239188L, 239188L), V1 = c(-2.932, -2.932, -2.932, -2.932, 1.061, -1.182, 4.771, -0.163, -1.573, -1.981, -1.981)), .Names = c("loopN", "TYPE", "V1"), class = "data.frame", row.names = c(NA, 11L))

# you start here...
library(data.table)
setkey(setDT(ERROR),loopN,TYPE)
setkey(setDT(RED),loopN,TYPE)
result <- ERROR[RED, allow.cartesian=TRUE]