用其他数据框填充列中的缺失值

时间:2019-11-28 21:07:50

标签: r merge dplyr

我有一个示例数据框,该数据框的一列每行存储3个字母。数据框还具有2个附加列:日期和颜色:

Alphabet       Date   Colour
  ABC    2018-09-10   green
  DEF    2017-06-11   red
  GHI    2016-05-12   blue
  JKL            NA   yellow
  MNO            NA   orange
  PQR       Unknown   brown

此数据框中某些日期丢失/未知。我有另一个数据框,其中也有一个字母和一个日期列。第二个数据框在第一个数据框中包含缺少日期的日期:

Alphabet       Date   
  JKL    2017-06-07  
  MNO    2018-08-03   
  PQR    2019-10-07
  STU    2019-11-08
  VWX    2019-12-08   

我想通过匹配两个数据框中的字母记录来填充第一个数据框中的缺失日期,然后将第二个数据框中的日期插入第一个数据框中。

所需的输出:

Alphabet       Date   Colour
  ABC    2018-09-10   green
  DEF    2017-06-11   red
  GHI    2016-05-12   blue
  JKL    2017-06-07   yellow
  MNO    2018-08-03   orange
  PQR    2019-10-07   brown

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

一个选择是加入data.table

library(data.table)
setDT(df1)[df2, Date := i.Date, on = .(Alphabet)]
df1
#   Alphabet       Date Colour
#1:      ABC 2018-09-10  green
#2:      DEF 2017-06-11    red
#3:      GHI 2016-05-12   blue
#4:      JKL 2017-06-07 yellow
#5:      MNO 2018-08-03 orange
#6:      PQR 2019-10-07  brown

更新

使用新的“ df2n”数据集

i1 <- is.na(df1$Date)|df1$Date %in% "Unknown"
setDT(df1)[df2n[df2n$Alphabet %in% df1$Alphabet[i1],],
         Date := i.Date, on = .(Alphabet)]
df1
#   Alphabet       Date Colour
#1:      ABC 2018-09-10  green
#2:      DEF 2017-06-11    red
#3:      GHI 2016-05-12   blue
#4:      JKL 2017-06-07 yellow
#5:      MNO 2018-08-03 orange
#6:      PQR 2019-10-07  brown

或使用match中的base R

i1 <- match(df2$Alphabet, df1$Alphabet)
df1$Date[i1] <- df2$Date

数据

df1 <- structure(list(Alphabet = c("ABC", "DEF", "GHI", "JKL", "MNO", 
"PQR"), Date = c("2018-09-10", "2017-06-11", "2016-05-12", NA, 
NA, "Unknown"), Colour = c("green", "red", "blue", "yellow", 
"orange", "brown")), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(Alphabet = c("JKL", "MNO", "PQR"), Date = c("2017-06-07", 
"2018-08-03", "2019-10-07")), class = "data.frame", row.names = c(NA, 
-3L))

df2a  <- structure(list(Alphabet = c("JKL", "MNO", "PQR", "STU", "VWX"
), Date = c("2017-06-07", "2018-08-03", "2019-10-07", "2019-11-08", 
"2019-12-08")), class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:1)

使用dplyr,我们可以left_join df1df2,然后使用coalesce来填写缺失的值。

library(dplyr)

left_join(df1, df2, by = "Alphabet") %>%
   mutate(Date = coalesce(Date.y, Date.x)) %>%
  select(-Date.x, -Date.y)

#  Alphabet Colour       Date
#1      ABC  green 2018-09-10
#2      DEF    red 2017-06-11
#3      GHI   blue 2016-05-12
#4      JKL yellow 2017-06-07
#5      MNO orange 2018-08-03
#6      PQR  brown 2019-10-07