我有两个数据文件,一个是主文件,另一个是较小的文件。我需要通过匹配唯一ID号和特定列来使用小文件中的数据更新主文件。
例如:
Master file (X)
ID a1 a2 b1 b2 c1 c2
1 a d 4 5 6 8
2 d f NA 1 3 12
3 e r 1 1 89 0
4 f we 10 NA 3 9
5 dd w NA 21 56 7
Small file (iy1)
ID b1 b2
1 4 5
2 27 1
3 1 1
4 10 9
5 56 21
我试过了
X$b1[na.omit(match(iy1$ID, X$ID))] <- iy1$b1[which(iy1$ID %in% X$ID)]
但如果我要更新1000条记录和1000列,这很乏味。
谢谢
答案 0 :(得分:0)
# Master file (X)
X = read.table(text =
"ID a1 a2 b1 b2 c1 c2
1 a d 4 5 6 8
2 d f NA 1 3 12
3 e r 1 1 89 0
4 f we 10 NA 3 9
42 f we 10 NA 3 9
5 dd w NA 21 56 7",
header = TRUE)
# Small file (iy1)
iy1 = read.table(text =
"ID b1 b2
5 4 5
4 27 1
3 1 1
2 10 9
21 10 9
1 56 21",
header = TRUE)
### Solution
# we suppose that ID's are unique in both datasets
to_update = c("b1", "b2") # columns which we want to update
updatable_id = intersect(X$ID, iy1$ID) # ID's which we can update
X[match(updatable_id, X$ID), to_update] = iy1[match(updatable_id, iy1$ID), to_update] # update
答案 1 :(得分:0)
以下是使用data.table
library(data.table)
nm1 <- setdiff(intersect(names(iy1), names(X)), "ID")
setDT(X)[iy1, (nm1) := mget(paste0("i.", nm1)), on = "ID"]
X
# ID a1 a2 b1 b2 c1 c2
#1: 1 a d 4 5 6 8
#2: 2 d f 27 1 3 12
#3: 3 e r 1 1 89 0
#4: 4 f we 10 9 3 9
#5: 5 dd w 56 21 56 7
答案 2 :(得分:0)