我有两个数据集df1和df2,它们共有一个列“ ID”和“国家”:
df1 <- data.frame(ID=c(1:20), State=c("NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","CA","IL","SD","NC","SC","WA","CO","AL","AK","HI"))
df2 <- data.frame(ID=c(1,2,3,4,5,"NA","NA","NA","NA","NA"), Year=c("2020","2021","2020","2020","2021","2020","2020","2021","2020","2019"),State=c("NA","NA","NA","NA","NA","CA","SC","NY","NJ","OR"))
如何将Year从df2到df1添加到df1中存在的相同ID或df1中存在的相同州?
我要进行此更改的原因:我只需要将此“年”信息从df2添加到df1。
答案 0 :(得分:0)
您可以这样做:
df1 <- type.convert(df1)
df2 <- type.convert(df2)
df1 %>%
left_join(select(df2, -State), 'ID') %>%
left_join(select(filter(df2, is.na(ID)), -ID), 'State') %>%
mutate(Year = coalesce(Year.x, Year.y), Year.x = NULL, Year.y = NULL)
ID State Year
1 1 <NA> 2020
2 2 <NA> 2021
3 3 <NA> 2020
4 4 <NA> 2020
5 5 <NA> 2021
6 6 <NA> NA
7 7 <NA> NA
8 8 <NA> NA
9 9 <NA> NA
10 10 <NA> NA
11 11 CA 2020
12 12 IL NA
13 13 SD NA
14 14 NC NA
15 15 SC 2020
16 16 WA NA
17 17 CO NA
18 18 AL NA
19 19 AK NA
20 20 HI NA
答案 1 :(得分:0)
这是一个dplyr
解决方案:
library(dplyr)
df1 <- df1 %>%
mutate(join = ifelse(State == 'NA', ID, State))
df2 <- df2 %>%
mutate(join = ifelse(State == 'NA', ID, State))
df_new <- left_join(df1, df2, by = "join") %>%
mutate(State = coalesce(State.x, State.y)) %>%
select(-c(State.x, State.y, join, ID.y)) %>%
rename(ID = ID.x)
这给我们:
ID Year State
1 1 2020 NA
2 2 2021 NA
3 3 2020 NA
4 4 2020 NA
5 5 2021 NA
6 6 <NA> NA
7 7 <NA> NA
8 8 <NA> NA
9 9 <NA> NA
10 10 <NA> NA
11 11 2020 CA
12 12 <NA> IL
13 13 <NA> SD
14 14 <NA> NC
15 15 2020 SC
16 16 <NA> WA
17 17 <NA> CO
18 18 <NA> AL
19 19 <NA> AK
20 20 <NA> HI