这有点难以解释,但我正在尝试比较来自两个不同数据框的“cpf”列。我想确定来自 (df1) 和 (df2) 的两个“cpf”列中的值何时相等(这些值可以在不同的行中)。之后,如果可以从其他数据框中获得 NA 值,我想更新这些值
df1
cpf x y
1 21 NA NA
2 32 NA NA
3 43 NA NA
4 54 NA NA
5 65 NA NA
df2
cpf x y
1 54 5 10
2 0 NA NA
3 65 3 2
4 0 NA NA
5 0 NA NA
我想要以下结果
df3
cpf x y
1 21 NA NA
2 32 NA NA
3 43 NA NA
4 54 5 10
5 65 3 2
答案 0 :(得分:6)
我们可以对 'cpf' 执行 join
并使用 fcoalecse
library(data.table)
setDT(df1)[df2, c('x', 'y') := .(fcoalesce(x, i.x),
fcoalesce(y, i.y)), on = .(cpf)]
-输出
df1
# cpf x y
#1: 21 NA NA
#2: 32 NA NA
#3: 43 NA NA
#4: 54 5 10
#5: 65 3 2
或者在 coalecse
之后使用 dplyr
中的 left_join
library(dplyr)
left_join(df1, df2, by = 'cpf') %>%
transmute(cpf, x = coalesce(x.x, x.y), y = coalesce(y.x, y.y))
# cpf x y
#1 21 NA NA
#2 32 NA NA
#3 43 NA NA
#4 54 5 10
#5 65 3 2
在base R
中,可以使用match
i1 <- match(df1$cpf, df2$cpf, nomatch = 0)
i2 <- match(df2$cpf, df1$cpf, nomatch = 0)
df1[i2, -1] <- df2[i1, -1]
df1 <- structure(list(cpf = c(21L, 32L, 43L, 54L, 65L), x = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), y = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c("1",
"2", "3", "4", "5"), class = "data.frame")
df2 <- structure(list(cpf = c(54L, 0L, 65L, 0L, 0L), x = c(5L, NA, 3L,
NA, NA), y = c(10L, NA, 2L, NA, NA)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
答案 1 :(得分:5)
df1 %>%
left_join(df2, by = "cpf") %>%
select(cpf, x = x.y, y = y.y)
输出:
cpf x y
1 21 NA NA
2 32 NA NA
3 43 NA NA
4 54 5 10
5 65 3 2
答案 2 :(得分:3)
使用 merge
的另一个基本 R 选项
merge(df1,
df2,
by = "cpf",
all.x = TRUE,
suffixes = c(".x", "")
)[names(df1)]
给予
cpf x y
1 21 NA NA
2 32 NA NA
3 43 NA NA
4 54 5 10
5 65 3 2