我有2个dfs。我要用另一个df中的值替换一个数据框中的2个变量的NA值。这是我的示例数据:
df1
id Sex Race Income
1 M White 1
2 NA Hispanic 2
3 NA NA 3
df2
id Sex Race
1 M White
2 F Hispanic
3 M White
4 F Black
我希望数据看起来像这样,其中df1的性别和种族的NA值由df2的值填充。
df2
id Sex Race Income
1 M White 1
2 F Hispanic 2
3 M White 3
4 F Black NA
有人可以帮忙吗?
答案 0 :(得分:1)
我们可以在此处使用联接
library(data.table)
setDT(df2)[df1, Income := Income, on = .(id)]
-输出
df2
# id Sex Race Income
#1: 1 M White 1
#2: 2 F Hispanic 2
#3: 3 M White 3
#4: 4 F Black NA
如果需要在非NA元素之间选择“性别”,“种族”
nm1 <- names(df2)[-1]
setDT(df2)[df1, c(nm1, 'Income') := c(Map(fcoalesce,
.SD[, nm1, with = FALSE], mget(paste0('i.', nm1))), list(Income)), on = .(id)]
-输出
df2
# id Sex Race Income
#1: 1 M White 1
#2: 2 F Hispanic 2
#3: 3 M White 3
#4: 4 F Black NA
或仅使用tidyverse
个功能使用dplyr
library(dplyr)
left_join(df2, df1, by = 'id') %>%
transmute(id, Sex = coalesce(Sex.x, Sex.y),
Race = coalesce(Race.x, Race.y),
Income)
-输出
# id Sex Race Income
#1 1 M White 1
#2 2 F Hispanic 2
#3 3 M White 3
#4 4 F Black NA
df1 <- structure(list(id = 1:3, Sex = c("M", NA, NA), Race = c("White",
"Hispanic", NA), Income = 1:3), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(id = 1:4, Sex = c("M", "F", "M", "F"), Race = c("White",
"Hispanic", "White", "Black")), class = "data.frame", row.names = c(NA,
-4L))
答案 1 :(得分:1)
使用merge
subset(
merge(df1, df2, by = "id", all.y = TRUE),
select = c("id", "Sex.y", "Race.y", "Income")
)
给出
id Sex.y Race.y Income
1 1 M White 1
2 2 F Hispanic 2
3 3 M White 3
4 4 F Black NA
答案 2 :(得分:0)
一种tidyverse
方法可以在将两个数据帧重塑为长(使用众所周知的pivot_longer()
)然后重塑为宽(使用pivot_wider()
)之后使用联接以获得预期结果。这里的代码:
library(tidyverse)
#Code
newdf <- df2 %>%
mutate(across(-id,~as.character(.))) %>%
pivot_longer(-id) %>%
full_join(df1 %>%
mutate(across(-id,~as.character(.))) %>%
pivot_longer(-id) %>% rename(value2=value)) %>%
mutate(value=ifelse(is.na(value),value2,value)) %>% select(-value2) %>%
pivot_wider(names_from = name,values_from=value) %>%
mutate(Income=as.numeric(Income))
输出:
# A tibble: 4 x 4
id Sex Race Income
<int> <chr> <chr> <dbl>
1 1 M White 1
2 2 F Hispanic 2
3 3 M White 3
4 4 F Black NA
使用了一些数据:
#Data 1
df1 <- structure(list(id = 1:3, Sex = c("M", NA, NA), Race = c("White",
"Hispanic", NA), Income = 1:3), class = "data.frame", row.names = c(NA,
-3L))
#Data 2
df2 <- structure(list(id = 1:4, Sex = c("M", "F", "M", "F"), Race = c("White",
"Hispanic", "White", "Black")), class = "data.frame", row.names = c(NA,
-4L))