给定R中列的条件,匹配两个数据框中的列

时间:2018-11-03 16:18:19

标签: r dataframe merge

我有两个数据框。例如:

df1 <- data.frame(actor = c("Angel","David","Adah","Sophia"),
                  gender=c("Unknown","male","Unknown","female"),
                  others= c("some","other","info","a"),
                  stringsAsFactors = FALSE)

   actor    gender   others
1  Angel    Unknown  some
2  David    Male     other
3  Adah     Unknown  info
4  Sophia   female   a

df2 <- data.frame(names = c("Miguel","Angel","David","Sophia"),
                  gender=c("male","male","male","female"),
                  stringsAsFactors = FALSE)

   names    gender
1  Miguel   male
2  Angel    male
3  David    male
4  Sophia   female

我想用df2完成df1中的“未知”性别。 我尝试这样做:

df1$gender[df1$gender == "Unknown"] <- df2$gender[ df2$names %in% df1$actor[df1$gender == "Unknown"]]

但是,即使男性或女性人数正确,结果也不正确。

所以我想要的结果是:

   actor    gender           others
1  Angel    male             some
2  David    male             other
3  Adah     Unknown (or NA)  info
4  Sophia   female           a

3 个答案:

答案 0 :(得分:0)

考虑将两个数据框的左连接mergeifelse一起更新 gender ,然后重新排列行。具体而言,将密钥添加到第一个数据帧作为辅助列,以在merge之后排序。

# MERGE AFTER ADD key COLUMN TO df1 AND RENAME COLUMNS IN df2
mdf <- merge(transform(df1, key=seq(nrow(df1))), setNames(df2, c('actor','gender')),
             by='actor', all.x=TRUE, suffixes=c('','_'))
mdf$gender <- ifelse(is.na(mdf$gender_), mdf$gender, mdf$gender_)

# RE-ORDER ROWS BY, THEN REMOVE HELPER COLUMNS
mdf <- with(mdf, transform(mdf[order(key),], key=NULL, gender_=NULL))
row.names(mdf) <- NULL

mdf
#    actor  gender others
# 1  Angel    male   some
# 2  David    male  other
# 3   Adah Unknown   info
# 4 Sophia  female      a

答案 1 :(得分:0)

完成丢失的数据是class A: def __init__(self, value: int): self._value = value # private backer @property def value(self): return self._value my_a = A(22) # works, no error print(my_a.value) # 22 的一个好用例。在这种情况下,这不是严格必要的,但是如果您有多个表且信息不完整,则可以派上用场!

dplyr::coalesce

答案 2 :(得分:0)

我们可以使用我的软件包safejoin中的safe_left_join, 并使用合并解决列冲突

# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)

df1$gender[df1$gender == "Unknown"] <- NA
safe_left_join(df1, df2, by = c(actor = "names"), conflict = coalesce)
#    actor gender others
# 1  Angel   male   some
# 2  David   male  other
# 3   Adah   <NA>   info
# 4 Sophia female      a