Question

我想用一个具有相同标识符的另一个数据帧的NA替换一个数据帧的值。也就是说，对于具有相同id的df1的所有值，在相应的id和indices处分配df2的“NA”值。

我有df1和df2：

df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))

我想要的是df3：

df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))

我已经尝试了查找功能和库“data.table”，但我可以得到正确的df3。有人可以帮帮我吗？

Answer 1

我们可以加入on'id'，然后通过乘以来替换NA值。

library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2,  (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD, 
                  mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
#   id  a  b  c
#1:  1 NA NA  9
#2:  1 NA NA  3
#3:  2  1  0 NA
#4:  2  7  3 NA
#5:  6 NA  2 NA
#6:  6 NA  5 NA

数据

df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))

注意：在OP的帖子NA中有"NA"

Answer 2

由于您的NA值实际上是文本“ NA”，因此您必须将所有变量转换为文本（使用 as.character ）。您可以按ID列将两个数据框连接在一起。由于两个数据帧都有a，b和c列，因此R将重命名a.x，b.x和c.x（df1）以及a.y，b.y和c.y（df2）。之后，您可以创建新的列a，b和c。当a.y ==“ NA”时，它们都具有“ NA”，否则，则以a.x命名（依此类推）。如果您的NA值是真实NA，则需要进行不同的 is.na（value）测试（请参见下面的代码示例）。

library(dplyr)

df1 %>%  
  mutate_all(as.character) %>% # allvariables as text
  left_join(df2 %>% 
              mutate_all(as.character) ## all variables as text
            , by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
  mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else  a.x 
         b = case_when(b.y == "NA" ~ "NA", TRUE  ~ b.x),
         c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
  select(id, a, b, c) ## keep only these initial columns

  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

##if your dataframe head real NA this is how you can test:
missing_value <- NA

is.na(missing_value) ## TRUE
missing_value == NA  ## Does not work with R

用另一个数据帧的NA值替换数据帧的值

2 个答案:

数据