所以,我有两个表示旧地址和当前地址的数据集。
move
我需要的是将新数据(main
)与旧数据(main
)合并,并在合并后更新id
。
我想知道它是否可以在一次操作中使用它?
更新基于idspace
,这是个人标识符。
x
,y
,> main
idspace id x y move
198 1238 33 4 stay
4 1236 4 1 move # this one is updated
1515 1237 30 28 move
是位置ID。
所以,我需要的输出是
merge(main, move, by = c('id'), all = T, suffixes = c('old', 'new'))
我不知道如何做到这一点。
像
这样的东西> dput(main)
structure(list(idspace = structure(c(2L, 3L, 1L), .Label = c("1515",
"198", "641"), class = "factor"), id = structure(c(3L, 1L, 2L
), .Label = c("1236", "1237", "1238"), class = "factor"), x = structure(c(2L,
3L, 1L), .Label = c("30", "33", "36"), class = "factor"), y = structure(c(3L,
1L, 2L), .Label = c("12", "28", "4"), class = "factor"), move = structure(c(2L,
1L, 1L), .Label = c("move", "stay"), class = "factor")), .Names = c("idspace",
"id", "x", "y", "move"), row.names = c(NA, -3L), class = "data.frame")
> dput(move)
structure(list(idspace = structure(1L, .Label = "4", class = "factor"),
id = structure(1L, .Label = "1236", class = "factor"), x = structure(1L, .Label = "4", class = "factor"),
y = structure(1L, .Label = "1", class = "factor"), move = structure(1L, .Label = "move", class = "factor")), .Names = c("idspace",
"id", "x", "y", "move"), row.names = c(NA, -1L), class = "data.frame")`
然而,这是错误的,因为我需要手工做很多操作。
任何解决方案?
数据
{{1}}
答案 0 :(得分:9)
使用data.table
的加入+更新功能:
require(data.table) # v1.9.6+
setDT(main) # convert data.frames to data.tables by reference
setDT(move)
main[move, on=c("id", "move"), # extract the row number in 'main' where 'move' matches
c("idspace", "x", "y") := .(i.idspace, i.x, i.y)] # update cols of 'main' with
# values from 'i' = 'move' for
# those matching rows
main
# idspace id x y move
# 1: 198 1238 33 4 stay
# 2: 4 1236 4 1 move
# 3: 1515 1237 30 28 move
这会就地更新main
。
答案 1 :(得分:1)
这是一个dplyr
解决方案:
# If you want both old and new
dplyr::full_join(main, move)
# If you want both old and new with a suffix column
main$suffix <- "old"
move$suffix <- "new"
dplyr::full_join(main, move)
# If you want new only
new <- dplyr::left_join(main,move,by="id") # could also use %>%
main[!is.na(new$move.y),1] <- new[!is.na(new$move.y),6]
main[!is.na(new$move.y),3:4] <- new[!is.na(new$move.y),7:8]
答案 2 :(得分:1)
我想我找到了一种用
解决这个问题的简单方法main = as.matrix(main)
move = as.matrix(move)
main[main[,'id'] %in% move[,'id'], ] <- move
哪个匹配id
,保持id
有序,只更改匹配的rows
。它似乎适用于整个数据集。