指示行合并已更新

时间:2018-08-31 19:49:58

标签: r dplyr

在新列中,我想表示每次合并更新丢失的记录。

目的:我有一个缺少分类代码的数据集。为了替换缺少的值,我使用了多个left_join/coalesce操作,用正确的代码替换了NA。我想跟踪每次迭代中更改了哪些值。

# DATA
df <- tibble(
x =  c(1, 2,  3, NA, NA), #<Original data
y = c( 1, NA, 3, 4, NA)   #<New data from join
)

# A tibble: 5 x 2
      x     y
  <dbl> <dbl>
1     1     1
2     2    NA
3     3     3
4    NA     4
5    NA    NA

我想看...

# A tibble: 5 x 2
      x changed  
  <dbl> <chr>    
1     1 no.change
2     2 no.change
3     3 no.change
4     4 corrected
5    NA no.change

1 个答案:

答案 0 :(得分:1)

您可以使用case_when

library(tidyverse)
df %>% 
  mutate(new = coalesce(x, y)) %>% 
  mutate(changed = case_when(
    x == new | is.na(new) ~ "no.change",
    TRUE ~ "corrected")) %>% 
  select(new, changed) # %>% rename(x = new)

结果

# A tibble: 5 x 2
#    new changed  
#  <dbl> <chr>    
#1     1 no.change
#2     2 no.change
#3     3 no.change
#4     4 corrected
#5    NA no.change