根据另一列的值替换一列中的NA

时间:2019-09-19 12:20:22

标签: r

我有一个如下数据框:

    name<-c("Fred","George","","Fred","George")
wif<-c("fd","gf",NA,NA,NA)
asv<-c("hj","fd",NA,NA,NA)
wdf<-c("bn","jk",NA,NA,NA)
label<-c("Fred","George","","Fred","George")
fam<-data.frame(name,wif,asd,wdf,label)

您可以看到前2行与后2行完全相同,但是wife1wife2以及wife3的值是NAs。中间有空白值和NAs,应保持空白。我想在最后两行中使用与前两行相同的值来填充。请注意,该解决方案应应用于具有不同行数的数据集中。

我尝试了fam %>% group_by(name) %>% mutate_all(~ .[!is.na(.)]),但得到了

mutate_all()` ignored the following grouping variables:
Column `name`
Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message.
Error: Column `wife1` must be length 1 (the group size), not 0

2 个答案:

答案 0 :(得分:2)

您可以将name列与其自身进行匹配,以获取首次出现该名称的索引,并将该行中的值用于要修改的列。

cols <- 2:4 # or if your column names contain a pattern: grep(pattern, names(fam))
fam[cols] <- fam[match(fam$name, fam$name), cols]

fam
#     name wife1 wife2 wife3  label
# 1   Fred    fd    hj    bn   Fred
# 2 George    gf    fd    jk George
# 3         <NA>  <NA>  <NA>       
# 4   Fred    fd    hj    bn   Fred
# 5 George    gf    fd    jk George

答案 1 :(得分:0)

使您的方法在这里工作是一种方法。如错误消息中所述,使用mutate_at并忽略没有-非NA值的组,并按原样返回它们。

library(dplyr)

fam %>%
  group_by(name) %>% 
  mutate_at(vars(starts_with("wife")), ~ if (any(!is.na(.))) .[!is.na(.)] else .)


#  name   wife1 wife2 wife3 label 
#  <chr>  <chr> <chr> <chr> <chr> 
#1 Fred   fd    hj    bn    Fred  
#2 George gf    fd    jk    George
#3 ""     NA    NA    NA    ""    
#4 Fred   fd    hj    bn    Fred  
#5 George gf    fd    jk    George

数据

name<-c("Fred","George","","Fred","George")
wife1<-c("fd","gf",NA,NA,NA)
wife2<-c("hj","fd",NA,NA,NA)
wife3<-c("bn","jk",NA,NA,NA)
label<-c("Fred","George","","Fred","George")
fam<-data.frame(name,wife1,wife2,wife3,label, stringsAsFactors = FALSE)