用另一个数据帧中的值替换一个数据帧的列中的值

时间:2014-08-27 06:59:38

标签: r dataframe

我试图找到解决这个问题的方法,我有两个数据框,一个是像

DF1

faID    uID
1     20909
1     6661
1     1591
1     28065
1     42783
1     3113
1     21647
1     3825
2     134766
2     271168
2     16710
2     4071608
2     2046526
2     5081272

,另一个数据框看起来像这样

DF2

uID   user_cent_w
1591    15844
42783   466
21647   1514
29695   13958
94120   3615
83098   128
138776  709
90352   991
115384  8039
74483   128

我想向user_cent添加新列DF1,该列的值与uIDDF2的值相匹配或替换{{1}的值} uIDDF1的{​​{1}}值,即user_cent_w DF2uID的值匹配,即DF1然后将DF2替换为user_cent_w个值。

我已尝试过

的解决方案

replace value in dataframe based on another data frame

但这会替换uID中的user_cent_w的值。

我的预期输出将如下所示:

faID

2 个答案:

答案 0 :(得分:1)

尝试:

 library(dplyr)
 res <- left_join(df1,df2,by="uID")
 res$uID[!is.na(res$user_cent_w)] <- res$user_cent_w[!is.na(res$user_cent_w)]
 res[,1:2]
   res[,1:2]
  #  faID     uID
  #1     1   20909
  #2     1    6661
  #3     1   15844
  #4     1   28065
  #5     1     466
  #6     1    3113
  #7     1    1514
  #8     1    3825
  #9     2  134766
  #10    2  271168
  #11    2   16710
  #12    2 4071608
  #13    2 2046526
  #14    2 5081272

或者

  left_join(df1, df2, by="uID") %>% 
  mutate(uID=ifelse(is.na(user_cent_w), uID, user_cent_w)) %>%
  select(-user_cent_w)

答案 1 :(得分:0)

虽然这个老问题已经有了接受的答案,但为了完整起见,我想添加两个data.table解决方案。

第一个创建一个新对象

library(data.table)
# coerce to data.table and right join on uID
result <- setDT(DF2)[setDT(DF1), on = "uID"][
  # replace uID by user_cent_w where available, remove column
  !is.na(user_cent_w), uID := user_cent_w][, -"user_cent_w"]
result
        uID faID
 1:   20909    1
 2:    6661    1
 3:   15844    1
 4:   28065    1
 5:     466    1
 6:    3113    1
 7:    1514    1
 8:    3825    1
 9:  134766    2
10:  271168    2
11:   16710    2
12: 4071608    2
13: 2046526    2
14: 5081272    2

第二个在加入时更新DF1 到位,这样可以避免复制对象以节省内存和时间:

setDT(DF1)[setDT(DF2), on = "uID", uID := ifelse(is.na(user_cent_w), uID, user_cent_w)]
DF1 
    faID     uID
 1:    1   20909
 2:    1    6661
 3:    1   15844
 4:    1   28065
 5:    1     466
 6:    1    3113
 7:    1    1514
 8:    1    3825
 9:    2  134766
10:    2  271168
11:    2   16710
12:    2 4071608
13:    2 2046526
14:    2 5081272

数据

DF1 <- structure(list(faID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L), uID = c(20909L, 6661L, 1591L, 28065L, 42783L, 
3113L, 21647L, 3825L, 134766L, 271168L, 16710L, 4071608L, 2046526L, 
5081272L)), .Names = c("faID", "uID"), row.names = c(NA, -14L
), class = "data.frame")

DF2 <- structure(list(uID = c(1591L, 42783L, 21647L, 29695L, 94120L, 
83098L, 138776L, 90352L, 115384L, 74483L), user_cent_w = c(15844L, 
466L, 1514L, 13958L, 3615L, 128L, 709L, 991L, 8039L, 128L)), .Names = c("uID", 
"user_cent_w"), row.names = c(NA, -10L), class = "data.frame")