R将列的值替换为其他列的行的值

时间:2016-03-09 19:28:57

标签: r replace

对于以下数据框:

df <- data.frame(name = c("July Doe", "John Doe", NA, "Jane Doe"), 
                 age = c(NA, NA, NA, 43), 
                 name1 = c(NA, NA, NA, "John Doe"), 
                 age1 = c(NA, NA, NA, 37), 
                 name2 = c(NA, NA, "July Doe", NA),
                 age2 = c(NA, NA, 7, NA))

提供:

          name age    name1 age1    name2 age2
    1 July Doe  NA     <NA>   NA     <NA>   NA
    2 John Doe  NA     <NA>   NA     <NA>   NA
    3     <NA>  NA     <NA>   NA July Doe    7
    4 Jane Doe  43 John Doe   37     <NA>   NA

ageage1age2匹配时,我需要将name更改为相应的name1name2

到目前为止,我已经想到了这一点(没有运气)。

df$age <- with(df, ifelse(is.na(df$age), ifelse(df$name %in% df$name1,
                          as.integer(df$age1), as.integer(df$age)), as.integer(df$age)))

如果有任何高级R用户可以解释,那将永远感激不尽。我想继续保留NA并且有类似的东西:

          name age    name1 age1    name2 age2
    1 July Doe   7     <NA>   NA     <NA>   NA
    2 John Doe  37     <NA>   NA     <NA>   NA
    3     <NA>  NA     <NA>   NA July Doe    7
    4 Jane Doe  43 John Doe   37     <NA>   NA

然后我可以处理只有NA和我不需要的列的删除行。

3 个答案:

答案 0 :(得分:3)

within(df,age[is.na(age)] <- c(age1,age2)[match(name[is.na(age)],c(as.character(name1),as.character(name2)))]);
##       name age    name1 age1    name2 age2
## 1 July Doe   7     <NA>   NA     <NA>   NA
## 2 John Doe  37     <NA>   NA     <NA>   NA
## 3     <NA>  NA     <NA>   NA July Doe    7
## 4 Jane Doe  43 John Doe   37     <NA>   NA

您的代码无效的原因是内部ifelse(),如果namename1内的匹配,则会重新测试,但所选的值最终将来自name的索引,而不是name1中匹配值的索引。

答案 1 :(得分:1)

试试这个:

res<-do.call(rbind,lapply(1:3,function(x) setNames(df[(2*x-1):(2*x)],c("name","age"))))
res$age<-ave(res$age,res$name,FUN=function(x) x[!is.na(x)])
do.call(cbind,split(res,(seq_len(nrow(res))-1) %/% (nrow(res)/3)))      
#    0.name 0.age   1.name 1.age   2.name 2.age
#1 July Doe     7     <NA>    NA     <NA>    NA
#2 John Doe    37     <NA>    NA     <NA>    NA
#3     <NA>    NA     <NA>    NA July Doe     7
#4 Jane Doe    43 John Doe    37     <NA>    NA

简而言之:首先,您只需使用两列(data.framename)创建age,这样就可以填充丢失的NA。最后,您将恢复为原始格式。

答案 2 :(得分:0)

如果你想留在ifelse ......

df$age <- ifelse(!is.na(df$age1[match(df$name, df$name1)]), 
                 df$age1[match(df$name, df$name1)],  
                 df$age2[match(df$name, df$name2)])