我想用一个具有相同标识符的另一个数据帧的NA替换一个数据帧的值。也就是说,对于具有相同id的df1的所有值,在相应的id和indices处分配df2的“NA”值。
我有df1和df2:
df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))
我想要的是df3:
df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))
我已经尝试了查找功能和库“data.table”,但我可以得到正确的df3。有人可以帮帮我吗?
答案 0 :(得分:2)
我们可以加入on
'id',然后通过乘以来替换NA值。
library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2, (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD,
mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
# id a b c
#1: 1 NA NA 9
#2: 1 NA NA 3
#3: 2 1 0 NA
#4: 2 7 3 NA
#5: 6 NA 2 NA
#6: 6 NA 5 NA
df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))
注意:在OP的帖子NA
中有"NA"
答案 1 :(得分:1)
由于您的NA值实际上是文本“ NA”,因此您必须将所有变量转换为文本(使用 as.character )。您可以按ID列将两个数据框连接在一起。由于两个数据帧都有a,b和c列,因此R将重命名a.x,b.x和c.x(df1)以及a.y,b.y和c.y(df2)。 之后,您可以创建新的列a,b和c。当a.y ==“ NA”时,它们都具有“ NA”,否则,则以a.x命名(依此类推)。如果您的NA值是真实NA,则需要进行不同的 is.na(value)测试(请参见下面的代码示例)。
library(dplyr)
df1 %>%
mutate_all(as.character) %>% # allvariables as text
left_join(df2 %>%
mutate_all(as.character) ## all variables as text
, by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else a.x
b = case_when(b.y == "NA" ~ "NA", TRUE ~ b.x),
c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
select(id, a, b, c) ## keep only these initial columns
id a b c
1 1 NA NA 9
2 1 NA NA 3
3 2 1 0 NA
4 2 7 3 NA
5 6 NA 2 NA
6 6 NA 5 NA
##if your dataframe head real NA this is how you can test:
missing_value <- NA
is.na(missing_value) ## TRUE
missing_value == NA ## Does not work with R