如何在两列中查找具有相同值的行?

时间:2021-05-12 02:18:26

标签: r dataframe

这有点难以解释,但我正在尝试比较来自两个不同数据框的“cpf”列。我想确定来自 (df1) 和 (df2) 的两个“cpf”列中的值何时相等(这些值可以在不同的行中)。之后,如果可以从其他数据框中获得 NA 值,我想更新这些值

df1 
    cpf x  y
1   21  NA NA
2   32  NA NA
3   43  NA NA
4   54  NA NA
5   65  NA NA

df2 
    cpf x  y
1   54  5  10
2   0   NA NA
3   65  3   2
4   0   NA NA
5   0  NA NA

我想要以下结果

df3 
    cpf x  y
1   21  NA NA
2   32  NA NA
3   43  NA NA
4   54  5  10
5   65  3   2

3 个答案:

答案 0 :(得分:6)

我们可以对 'cpf' 执行 join 并使用 fcoalecse

library(data.table)
setDT(df1)[df2, c('x', 'y') := .(fcoalesce(x, i.x), 
        fcoalesce(y, i.y)), on = .(cpf)]

-输出

df1
#   cpf  x  y
#1:  21 NA NA
#2:  32 NA NA
#3:  43 NA NA
#4:  54  5 10
#5:  65  3  2

或者在 coalecse 之后使用 dplyr 中的 left_join

library(dplyr)
left_join(df1, df2, by = 'cpf') %>%
     transmute(cpf, x = coalesce(x.x, x.y), y = coalesce(y.x, y.y))
#  cpf  x  y
#1  21 NA NA
#2  32 NA NA
#3  43 NA NA
#4  54  5 10
#5  65  3  2

base R中,可以使用match

i1 <- match(df1$cpf, df2$cpf, nomatch = 0)
i2 <- match(df2$cpf, df1$cpf, nomatch = 0)
df1[i2, -1] <- df2[i1, -1]

数据

df1 <- structure(list(cpf = c(21L, 32L, 43L, 54L, 65L), x = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), y = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c("1", 
"2", "3", "4", "5"), class = "data.frame")

df2 <- structure(list(cpf = c(54L, 0L, 65L, 0L, 0L), x = c(5L, NA, 3L, 
NA, NA), y = c(10L, NA, 2L, NA, NA)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

答案 1 :(得分:5)

df1 %>% 
  left_join(df2, by = "cpf") %>% 
  select(cpf, x = x.y, y = y.y)

输出:

  cpf  x  y
1  21 NA NA
2  32 NA NA
3  43 NA NA
4  54  5 10
5  65  3  2

答案 2 :(得分:3)

使用 merge 的另一个基本 R 选项

merge(df1,
  df2,
  by = "cpf",
  all.x = TRUE,
  suffixes = c(".x", "")
)[names(df1)]

给予

  cpf  x  y
1  21 NA NA
2  32 NA NA
3  43 NA NA
4  54  5 10
5  65  3  2