我有两个data.tables(称为dt1和dt2),dt1包含一个可以跨记录复制的id变量。 dt2包含来自dt1的id的所有可能唯一值以及分配给该井调用id2的唯一id。 dt1只包含所有可能id值的子集,其中dt2包含整个集合。
我想用dt2中匹配的id2值更新dt1。这导致我遵循一些在某些时候有效的代码,有时它会给我一个警告,即在分配id_new时会回收值。
SCRIPT EXISTS sha1
下面是一组可重现的代码,用于显示何时有效以及何时无效。
dt1[ dt2, id_new := id2, nomatch = 0 ]
现在我们将dt_big子集化为小于dt_small,而dt_big仍包含重复的ID
set.seed(1)
# dt_big can contain duplicate id values
dt_big <- data.table(id = letters[c(1,1,2,2,3,4,5,5)],
value = sample(8),
key = "id")
# dt_small contains unique big_id values as well as it's own unique
dt_small <- data.table(id = 1:5,
big_id = letters[1:5],
key = "big_id")
# This works fine
dt_big[dt_small, id_new := i.id,nomatch=0]
dt_big
现在我们将sub dt_big小于dt_small但只包含唯一值
dt_big <- data.table(id = letters[c(1,1,2,2,3,4,5,5)],
value = sample(8),
key = "id")
dt_big_sub_dups <- dt_big[c(1,1,5)]
# Again this works fine
dt_big_sub_dups[dt_small,id_new := i.id, nomatch=0]
dt_big_sub_dups
这也会产生错误的结果
dt_big <- data.table(id = letters[c(1,1,2,2,3,4,5,5)],
value = sample(8),
key = "id")
dt_big_sub_no_dups <- dt_big[c(1,3,6)]
# Gives warning ... Supplied 3 items to be assigned to 5 items of column id_new' ...
dt_big_sub_no_dups[dt_small,id_new := i.id, nomatch=0]
dt_big_sub_no_dups
当id =“d”时,id_new应该= 4
答案 0 :(得分:2)
使用data.table
版本1.9.5
(以及set.seed(42)
):
dt_big_sub_no_dups
# id value id_new
# 1: a 8 1
# 2: b 3 2
# 3: d 7 4