根据df1和df2之间的匹配,将列从df2添加到df1

时间:2020-11-10 14:13:39

标签: r

我有两个数据集df1和df2,它们共有一个列“ ID”和“国家”:

df1 <- data.frame(ID=c(1:20), State=c("NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","CA","IL","SD","NC","SC","WA","CO","AL","AK","HI"))
df2 <- data.frame(ID=c(1,2,3,4,5,"NA","NA","NA","NA","NA"), Year=c("2020","2021","2020","2020","2021","2020","2020","2021","2020","2019"),State=c("NA","NA","NA","NA","NA","CA","SC","NY","NJ","OR"))

如何将Year从df2到df1添加到df1中存在的相同ID或df1中存在的相同州?

我要进行此更改的原因:我只需要将此“年”信息从df2添加到df1。

2 个答案:

答案 0 :(得分:0)

您可以这样做:

df1 <- type.convert(df1)
df2 <- type.convert(df2)

df1 %>%
    left_join(select(df2, -State), 'ID') %>%
    left_join(select(filter(df2, is.na(ID)), -ID), 'State') %>%
    mutate(Year = coalesce(Year.x, Year.y), Year.x = NULL, Year.y = NULL)

   ID State Year
1   1  <NA> 2020
2   2  <NA> 2021
3   3  <NA> 2020
4   4  <NA> 2020
5   5  <NA> 2021
6   6  <NA>   NA
7   7  <NA>   NA
8   8  <NA>   NA
9   9  <NA>   NA
10 10  <NA>   NA
11 11    CA 2020
12 12    IL   NA
13 13    SD   NA
14 14    NC   NA
15 15    SC 2020
16 16    WA   NA
17 17    CO   NA
18 18    AL   NA
19 19    AK   NA
20 20    HI   NA

答案 1 :(得分:0)

这是一个dplyr解决方案:

library(dplyr)

df1 <- df1 %>% 
  mutate(join = ifelse(State == 'NA', ID, State))

df2 <- df2 %>% 
  mutate(join = ifelse(State == 'NA', ID, State))

df_new <- left_join(df1, df2, by = "join") %>% 
  mutate(State = coalesce(State.x, State.y)) %>% 
  select(-c(State.x, State.y, join, ID.y)) %>% 
  rename(ID = ID.x)

这给我们:

   ID Year State
1   1 2020    NA
2   2 2021    NA
3   3 2020    NA
4   4 2020    NA
5   5 2021    NA
6   6 <NA>    NA
7   7 <NA>    NA
8   8 <NA>    NA
9   9 <NA>    NA
10 10 <NA>    NA
11 11 2020    CA
12 12 <NA>    IL
13 13 <NA>    SD
14 14 <NA>    NC
15 15 2020    SC
16 16 <NA>    WA
17 17 <NA>    CO
18 18 <NA>    AL
19 19 <NA>    AK
20 20 <NA>    HI