R中的匹配数据替换

时间:2019-09-07 14:59:56

标签: r merge match

我有两个具有相似维度和相似列名的数据集。目标是检查其中一个数据集中是否存在NA值,并用另一个数据集中的相应值替换,如下面的示例所示。

我尝试运行for循环来解决该问题,但这没有用,并且失败了。

df是使用NA

创建的新数据框
loop =  for (a in 1:nrow(data1)) {
       for (b in 1:ncol(data1)) {
       for (c in 1:nrow(data2)) {
       for (d in 1:ncol(data2)) {
       for (x in 1:nrow(df))    {
       for (y in 1:ncol(df))    {
       df[x,y]<- ifelse(data1[a,b] != "NA", data1[a,b], data2[c,d])
       return(df)`enter code here`
}
}    
}   
}  
} 
}

示例

# The first data frame 
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F", 
NA, "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))
#     age gender
# 1    23 M     
# 2    22 F     
# 3    21 NA    
# 4    20 F     
# The second data frame 
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F", 
"M", "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))
#     age gender
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F     

所需的输出

Age   Gender
23    M
22    F
21    M
20    F

2 个答案:

答案 0 :(得分:0)

您可以尝试以下方法:

df1 <- tibble(age = c(23,22,21,20), 
             gender = c("M", "F", NA, "F"))

# -------------------------------------------------------------------------
#> df1
# # A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 NA    
# 4    20 F     

# -------------------------------------------------------------------------

df2 <- tibble(age = c(23,22,21,20), 
             gender = c("M", "F", "M", "F"))

# -------------------------------------------------------------------------
#> df2
# # A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F     
# -------------------------------------------------------------------------
# get the na in df1 of gender var
df1.na <- is.na(df1$gender)
#> df1.na
# [1] FALSE FALSE  TRUE FALSE
# -------------------------------------------------------------------------


# use the values in df2 to replace na in df1 (Note that this is index based)
df1$gender[df1.na] <- df2$gender[df1.na]
df1

# -------------------------------------------------------------------------
#> df1
# A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F     
# -------------------------------------------------------------------------

答案 1 :(得分:0)

可以使用natural_join库中的rqdatatable函数来完成此操作。该函数确实需要合并一个索引,因此我们将需要创建一个索引。

创建可复制的示例将帮助其他人帮助您。在这里,我创建了两个简单的数据框,这些框应涵盖大多数情况下的问题。

# Create example data
tbl1 <- 
  data.frame(
    w = c(1, 2, 3, 4),
    x = c(1, 2, 3, NA),
    y = c(1, 2, 3, 4),
    z = c(1, NA, NA, NA)
  )

tbl2 <-
  data.frame(
    w = c(9, 9, 9, 9), # check value doesnt overwrite value,
    x = c(1, 2, 3, 4), # check na gets filled in
    y = c(1, 2, 3, NA), # check NA doesnt overwrite value
    z = c(9, NA, NA, NA) # check NA in both stays NA
  )

# Create join index 
tbl1$indx <- 1:nrow(tbl1)
tbl2$indx <- 1:nrow(tbl2) 

# Use natural_join 
library("rqdatatable")
natural_join(tbl1, tbl2, by = "indx")