我试图找到解决这个问题的方法,我有两个数据框,一个是像
DF1
faID uID
1 20909
1 6661
1 1591
1 28065
1 42783
1 3113
1 21647
1 3825
2 134766
2 271168
2 16710
2 4071608
2 2046526
2 5081272
,另一个数据框看起来像这样
DF2
uID user_cent_w
1591 15844
42783 466
21647 1514
29695 13958
94120 3615
83098 128
138776 709
90352 991
115384 8039
74483 128
我想向user_cent
添加新列DF1
,该列的值与uID
中DF2
的值相匹配或替换{{1}的值} uID
中DF1
的{{1}}值,即user_cent_w
DF2
与uID
的值匹配,即DF1
然后将DF2
替换为user_cent_w
个值。
我已尝试过
的解决方案replace value in dataframe based on another data frame
但这会替换uID
中的user_cent_w
的值。
我的预期输出将如下所示:
faID
答案 0 :(得分:1)
尝试:
library(dplyr)
res <- left_join(df1,df2,by="uID")
res$uID[!is.na(res$user_cent_w)] <- res$user_cent_w[!is.na(res$user_cent_w)]
res[,1:2]
res[,1:2]
# faID uID
#1 1 20909
#2 1 6661
#3 1 15844
#4 1 28065
#5 1 466
#6 1 3113
#7 1 1514
#8 1 3825
#9 2 134766
#10 2 271168
#11 2 16710
#12 2 4071608
#13 2 2046526
#14 2 5081272
或者
left_join(df1, df2, by="uID") %>%
mutate(uID=ifelse(is.na(user_cent_w), uID, user_cent_w)) %>%
select(-user_cent_w)
答案 1 :(得分:0)
虽然这个老问题已经有了接受的答案,但为了完整起见,我想添加两个data.table
解决方案。
第一个创建一个新对象
library(data.table)
# coerce to data.table and right join on uID
result <- setDT(DF2)[setDT(DF1), on = "uID"][
# replace uID by user_cent_w where available, remove column
!is.na(user_cent_w), uID := user_cent_w][, -"user_cent_w"]
result
uID faID 1: 20909 1 2: 6661 1 3: 15844 1 4: 28065 1 5: 466 1 6: 3113 1 7: 1514 1 8: 3825 1 9: 134766 2 10: 271168 2 11: 16710 2 12: 4071608 2 13: 2046526 2 14: 5081272 2
第二个在加入时更新DF1
到位,这样可以避免复制对象以节省内存和时间:
setDT(DF1)[setDT(DF2), on = "uID", uID := ifelse(is.na(user_cent_w), uID, user_cent_w)]
DF1
faID uID 1: 1 20909 2: 1 6661 3: 1 15844 4: 1 28065 5: 1 466 6: 1 3113 7: 1 1514 8: 1 3825 9: 2 134766 10: 2 271168 11: 2 16710 12: 2 4071608 13: 2 2046526 14: 2 5081272
DF1 <- structure(list(faID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L), uID = c(20909L, 6661L, 1591L, 28065L, 42783L,
3113L, 21647L, 3825L, 134766L, 271168L, 16710L, 4071608L, 2046526L,
5081272L)), .Names = c("faID", "uID"), row.names = c(NA, -14L
), class = "data.frame")
DF2 <- structure(list(uID = c(1591L, 42783L, 21647L, 29695L, 94120L,
83098L, 138776L, 90352L, 115384L, 74483L), user_cent_w = c(15844L,
466L, 1514L, 13958L, 3615L, 128L, 709L, 991L, 8039L, 128L)), .Names = c("uID",
"user_cent_w"), row.names = c(NA, -10L), class = "data.frame")