我有两个数据框。例如:
df1 <- data.frame(actor = c("Angel","David","Adah","Sophia"),
gender=c("Unknown","male","Unknown","female"),
others= c("some","other","info","a"),
stringsAsFactors = FALSE)
actor gender others
1 Angel Unknown some
2 David Male other
3 Adah Unknown info
4 Sophia female a
df2 <- data.frame(names = c("Miguel","Angel","David","Sophia"),
gender=c("male","male","male","female"),
stringsAsFactors = FALSE)
names gender
1 Miguel male
2 Angel male
3 David male
4 Sophia female
我想用df2完成df1中的“未知”性别。 我尝试这样做:
df1$gender[df1$gender == "Unknown"] <- df2$gender[ df2$names %in% df1$actor[df1$gender == "Unknown"]]
但是,即使男性或女性人数正确,结果也不正确。
所以我想要的结果是:
actor gender others
1 Angel male some
2 David male other
3 Adah Unknown (or NA) info
4 Sophia female a
答案 0 :(得分:0)
考虑将两个数据框的左连接merge
与ifelse
一起更新 gender ,然后重新排列行。具体而言,将密钥添加到第一个数据帧作为辅助列,以在merge
之后排序。
# MERGE AFTER ADD key COLUMN TO df1 AND RENAME COLUMNS IN df2
mdf <- merge(transform(df1, key=seq(nrow(df1))), setNames(df2, c('actor','gender')),
by='actor', all.x=TRUE, suffixes=c('','_'))
mdf$gender <- ifelse(is.na(mdf$gender_), mdf$gender, mdf$gender_)
# RE-ORDER ROWS BY, THEN REMOVE HELPER COLUMNS
mdf <- with(mdf, transform(mdf[order(key),], key=NULL, gender_=NULL))
row.names(mdf) <- NULL
mdf
# actor gender others
# 1 Angel male some
# 2 David male other
# 3 Adah Unknown info
# 4 Sophia female a
答案 1 :(得分:0)
完成丢失的数据是class A:
def __init__(self, value: int):
self._value = value # private backer
@property
def value(self):
return self._value
my_a = A(22) # works, no error
print(my_a.value) # 22
的一个好用例。在这种情况下,这不是严格必要的,但是如果您有多个表且信息不完整,则可以派上用场!
dplyr::coalesce
答案 2 :(得分:0)
我们可以使用我的软件包safejoin中的safe_left_join
,
并使用合并解决列冲突
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
df1$gender[df1$gender == "Unknown"] <- NA
safe_left_join(df1, df2, by = c(actor = "names"), conflict = coalesce)
# actor gender others
# 1 Angel male some
# 2 David male other
# 3 Adah <NA> info
# 4 Sophia female a