我有一个如下数据框:
name<-c("Fred","George","","Fred","George")
wif<-c("fd","gf",NA,NA,NA)
asv<-c("hj","fd",NA,NA,NA)
wdf<-c("bn","jk",NA,NA,NA)
label<-c("Fred","George","","Fred","George")
fam<-data.frame(name,wif,asd,wdf,label)
您可以看到前2行与后2行完全相同,但是wife1
和wife2
以及wife3
的值是NAs
。中间有空白值和NAs
,应保持空白。我想在最后两行中使用与前两行相同的值来填充。请注意,该解决方案应应用于具有不同行数的数据集中。
我尝试了fam %>% group_by(name) %>% mutate_all(~ .[!is.na(.)])
,但得到了
mutate_all()` ignored the following grouping variables:
Column `name`
Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message.
Error: Column `wife1` must be length 1 (the group size), not 0
答案 0 :(得分:2)
您可以将name列与其自身进行匹配,以获取首次出现该名称的索引,并将该行中的值用于要修改的列。
cols <- 2:4 # or if your column names contain a pattern: grep(pattern, names(fam))
fam[cols] <- fam[match(fam$name, fam$name), cols]
fam
# name wife1 wife2 wife3 label
# 1 Fred fd hj bn Fred
# 2 George gf fd jk George
# 3 <NA> <NA> <NA>
# 4 Fred fd hj bn Fred
# 5 George gf fd jk George
答案 1 :(得分:0)
使您的方法在这里工作是一种方法。如错误消息中所述,使用mutate_at
并忽略没有-非NA值的组,并按原样返回它们。
library(dplyr)
fam %>%
group_by(name) %>%
mutate_at(vars(starts_with("wife")), ~ if (any(!is.na(.))) .[!is.na(.)] else .)
# name wife1 wife2 wife3 label
# <chr> <chr> <chr> <chr> <chr>
#1 Fred fd hj bn Fred
#2 George gf fd jk George
#3 "" NA NA NA ""
#4 Fred fd hj bn Fred
#5 George gf fd jk George
数据
name<-c("Fred","George","","Fred","George")
wife1<-c("fd","gf",NA,NA,NA)
wife2<-c("hj","fd",NA,NA,NA)
wife3<-c("bn","jk",NA,NA,NA)
label<-c("Fred","George","","Fred","George")
fam<-data.frame(name,wife1,wife2,wife3,label, stringsAsFactors = FALSE)