根据另一个df的值填写1 df中多列的NA值

时间:2020-10-10 20:34:22

标签: r dplyr

我有2个dfs。我要用另一个df中的值替换一个数据框中的2个变量的NA值。这是我的示例数据:

df1
id    Sex    Race     Income
1     M      White      1
2     NA     Hispanic   2
3     NA     NA         3
df2
id    Sex    Race
1     M      White
2     F      Hispanic
3     M      White
4     F      Black

我希望数据看起来像这样,其中df1的性别和种族的NA值由df2的值填充。

df2
id    Sex    Race      Income
1     M      White       1
2     F      Hispanic    2
3     M      White       3
4     F      Black       NA

有人可以帮忙吗?

3 个答案:

答案 0 :(得分:1)

我们可以在此处使用联接

library(data.table)
setDT(df2)[df1, Income := Income, on = .(id)]

-输出

df2
#   id Sex     Race Income
#1:  1   M    White      1
#2:  2   F Hispanic      2
#3:  3   M    White      3
#4:  4   F    Black     NA

如果需要在非NA元素之间选择“性别”,“种族”

nm1 <- names(df2)[-1]
setDT(df2)[df1, c(nm1, 'Income') := c(Map(fcoalesce, 
  .SD[, nm1, with = FALSE], mget(paste0('i.', nm1))), list(Income)), on = .(id)]

-输出

df2
#   id Sex     Race Income
#1:  1   M    White      1
#2:  2   F Hispanic      2
#3:  3   M    White      3
#4:  4   F    Black     NA

或仅使用tidyverse个功能使用dplyr

library(dplyr)
left_join(df2, df1, by = 'id') %>% 
  transmute(id,  Sex = coalesce(Sex.x, Sex.y),
                Race = coalesce(Race.x, Race.y),
           Income)

-输出

#  id Sex     Race Income
#1  1   M    White      1
#2  2   F Hispanic      2
#3  3   M    White      3
#4  4   F    Black     NA

数据

df1 <- structure(list(id = 1:3, Sex = c("M", NA, NA), Race = c("White", 
"Hispanic", NA), Income = 1:3), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(id = 1:4, Sex = c("M", "F", "M", "F"), Race = c("White", 
"Hispanic", "White", "Black")), class = "data.frame", row.names = c(NA, 
-4L))

答案 1 :(得分:1)

使用merge

的基本R选项
subset(
  merge(df1, df2, by = "id", all.y = TRUE),
  select = c("id", "Sex.y", "Race.y", "Income")
)

给出

  id Sex.y   Race.y Income
1  1     M    White      1
2  2     F Hispanic      2
3  3     M    White      3
4  4     F    Black     NA

答案 2 :(得分:0)

一种tidyverse方法可以在将两个数据帧重塑为长(使用众所周知的pivot_longer())然后重塑为宽(使用pivot_wider())之后使用联接以获得预期结果。这里的代码:

library(tidyverse)
#Code
newdf <- df2 %>% 
  mutate(across(-id,~as.character(.))) %>%
  pivot_longer(-id) %>%
  full_join(df1 %>% 
              mutate(across(-id,~as.character(.))) %>%
              pivot_longer(-id) %>% rename(value2=value)) %>%
  mutate(value=ifelse(is.na(value),value2,value)) %>% select(-value2) %>%
  pivot_wider(names_from = name,values_from=value) %>%
  mutate(Income=as.numeric(Income))

输出:

# A tibble: 4 x 4
     id Sex   Race     Income
  <int> <chr> <chr>     <dbl>
1     1 M     White         1
2     2 F     Hispanic      2
3     3 M     White         3
4     4 F     Black        NA

使用了一些数据:

#Data 1
df1 <- structure(list(id = 1:3, Sex = c("M", NA, NA), Race = c("White", 
"Hispanic", NA), Income = 1:3), class = "data.frame", row.names = c(NA, 
-3L))

#Data 2
df2 <- structure(list(id = 1:4, Sex = c("M", "F", "M", "F"), Race = c("White", 
"Hispanic", "White", "Black")), class = "data.frame", row.names = c(NA, 
-4L))