用另一个数据帧的NA值替换数据帧的值

时间:2017-11-13 10:18:19

标签: r

我想用一个具有相同标识符的另一个数据帧的NA替换一个数据帧的值。也就是说,对于具有相同id的df1的所有值,在相应的id和indices处分配df2的“NA”值。

我有df1和df2:

df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))

我想要的是df3:

df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))

我已经尝试了查找功能和库“data.table”,但我可以得到正确的df3。有人可以帮帮我吗?

2 个答案:

答案 0 :(得分:2)

我们可以加入on'id',然后通过乘以来替换NA值。

library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2,  (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD, 
                  mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
#   id  a  b  c
#1:  1 NA NA  9
#2:  1 NA NA  3
#3:  2  1  0 NA
#4:  2  7  3 NA
#5:  6 NA  2 NA
#6:  6 NA  5 NA

数据

df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))

注意:在OP的帖子NA中有"NA"

答案 1 :(得分:1)

由于您的NA值实际上是文本“ NA”,因此您必须将所有变量转换为文本(使用 as.character )。您可以按ID列将两个数据框连接在一起。由于两个数据帧都有a,b和c列,因此R将重命名a.x,b.x和c.x(df1)以及a.y,b.y和c.y(df2)。 之后,您可以创建新的列a,b和c。当a.y ==“ NA”时,它们都具有“ NA”,否则,则以a.x命名(依此类推)。如果您的NA值是真实NA,则需要进行不同的 is.na(value)测试(请参见下面的代码示例)。

library(dplyr)

df1 %>%  
  mutate_all(as.character) %>% # allvariables as text
  left_join(df2 %>% 
              mutate_all(as.character) ## all variables as text
            , by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
  mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else  a.x 
         b = case_when(b.y == "NA" ~ "NA", TRUE  ~ b.x),
         c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
  select(id, a, b, c) ## keep only these initial columns

  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

##if your dataframe head real NA this is how you can test:
missing_value <- NA

is.na(missing_value) ## TRUE
missing_value == NA  ## Does not work with R