根据条件更新R dataframe列

时间:2017-05-07 21:06:19

标签: r dataframe conditional-statements

我正在尝试根据特定条件更新数据框。以下是示例数据框。

  fname mname lname
 1   RONALD D VALE
 2   RONALD  VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

如果名字和姓氏匹配,我想更新中间名列。在这个例子中,我希望得到以下输出。

  fname mname lname
 1   RONALD D VALE
 2   RONALD D VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

如果有两个不同的中间首字母,我也不想更新表格。数据中有一些缺失值。因此,主要目的是识别和合并可能类似的多个条目。与此同时,我们不希望将错误的数据引入表中。

3 个答案:

答案 0 :(得分:1)

tidyverse解决方案:

df %>% 
  group_by(fname, lname) %>% 
  mutate(mname_count = n_distinct(mname, na.rm = TRUE)) %>%
  mutate(mname = ifelse(mname_count == 1, unique(na.omit(mname)), mname)) %>%
  select(-mname_count)

答案 1 :(得分:0)

丑陋的基础R解决方案(假设您将""更改为NA):

unic<-unique(lolz[,c("fname","lname")])

for (i in 1:nrow(unic)){
  lelz<-lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],]$mnam
  if (sum(!is.na(lelz))==1){
    lelz[is.na(lelz)] <- "D"
    lolz[lolz[,"fname"]==unic[i,1] & lolz[,"lname"]==unic[i,2],][,2]<-lelz
  }
}

答案 2 :(得分:0)

我们可以使用data.table

library(data.table)
setDT(df1)[, mname := if(uniqueN(mname[nzchar(mname)])==1) 
                           mname[nzchar(mname)] else mname, .(fname,  lname)]
df1
#    fname mname lname
#1: RONALD     D  VALE
#2: RONALD     D  VALE
#3:   JACK     A SMITH
#4:   JACK     B SMITH
#5:   JACK       SMITH

数据

df1 <- structure(list(fname = c("RONALD", "RONALD", "JACK", "JACK", 
 "JACK"), mname = c("D", "", "A", "B", ""), lname = c("VALE", 
 "VALE", "SMITH", "SMITH", "SMITH")), .Names = c("fname", "mname", 
 "lname"), class = "data.frame", row.names = c("1", "2", "3", 
 "4", "5"))