我正在尝试根据特定条件更新数据框。以下是示例数据框。
fname mname lname
1 RONALD D VALE
2 RONALD VALE
3 JACK A SMITH
4 JACK B SMITH
5 JACK SMITH
如果名字和姓氏匹配,我想更新中间名列。在这个例子中,我希望得到以下输出。
fname mname lname
1 RONALD D VALE
2 RONALD D VALE
3 JACK A SMITH
4 JACK B SMITH
5 JACK SMITH
如果有两个不同的中间首字母,我也不想更新表格。数据中有一些缺失值。因此,主要目的是识别和合并可能类似的多个条目。与此同时,我们不希望将错误的数据引入表中。
答案 0 :(得分:1)
tidyverse
解决方案:
df %>%
group_by(fname, lname) %>%
mutate(mname_count = n_distinct(mname, na.rm = TRUE)) %>%
mutate(mname = ifelse(mname_count == 1, unique(na.omit(mname)), mname)) %>%
select(-mname_count)
答案 1 :(得分:0)
丑陋的基础R解决方案(假设您将""
更改为NA
):
unic<-unique(lolz[,c("fname","lname")])
for (i in 1:nrow(unic)){
lelz<-lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],]$mnam
if (sum(!is.na(lelz))==1){
lelz[is.na(lelz)] <- "D"
lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],][,2]<-lelz
}
}
答案 2 :(得分:0)
我们可以使用data.table
library(data.table)
setDT(df1)[, mname := if(uniqueN(mname[nzchar(mname)])==1)
mname[nzchar(mname)] else mname, .(fname, lname)]
df1
# fname mname lname
#1: RONALD D VALE
#2: RONALD D VALE
#3: JACK A SMITH
#4: JACK B SMITH
#5: JACK SMITH
df1 <- structure(list(fname = c("RONALD", "RONALD", "JACK", "JACK",
"JACK"), mname = c("D", "", "A", "B", ""), lname = c("VALE",
"VALE", "SMITH", "SMITH", "SMITH")), .Names = c("fname", "mname",
"lname"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))