我正在寻找以下问题的优雅解决方案:
我需要根据不同的匹配标准将所有者分配给公司。这些匹配标准具有不同的质量,因此只有在较高质量标准不产生结果时才应使用质量较弱的标准。在我的示例中,所有a
条件都具有与b
条件相同的质量级别和更高的质量。
以下说明我的观点:
firmname <- c("Firm A", "Firm B", "Firm C", "Firm D", "Firm E", "Firm F")
ownermatch_a1 <- c("Owner 1", NA, NA, NA, "Owner 5", "Owner 6")
ownermatch_a2 <- c("Owner 1", NA, NA, "Owner 4", "Owner 5", "Owner 6")
ownermatch_a3 <- c("Owner 1", NA, "Owner 3", "Owner 4", "Owner 5", "Owner 6")
ownermatch_b1 <- c("Owner 1", "Owner 2", "Owner 3", "Owner 4", "Owner 5", "Owner 6")
ownerfinal <- (NA)
data.frame(firmname, ownermatch_a1, ownermatch_a2, ownermatch_a3, ownermatch_b1, ownerfinal)
这会产生下表
firmname ownermatch_a1 ownermatch_a2 ownermatch_a3 ownermatch_b1 ownerfinal
1 Firm A Owner 1 Owner 1 Owner 1 Owner 1 <NA>
2 Firm B <NA> <NA> <NA> Owner 2 <NA>
3 Firm C <NA> <NA> Owner 3 Owner 3 <NA>
4 Firm D <NA> Owner 4 Owner 4 Owner 4 <NA>
5 Firm E Owner 5 Owner 5 Owner 5 Owner 5 <NA>
6 Firm F Owner 6 Owner 6 Owner 6 Owner 6 <NA>
我现在想让R做以下事情:
1)如果3个a
条件中的任何一个是非NA,则将其设为ownerfinal
。
2)如果有多个并行a
非NA,请随机选择其中任何一个,并将其设置为ownerfinal
3)只有当所有这些都是NA时,取ownermatch_b1
并将其设为ownerfinal
。
所以在上面的例子中: 公司A:选择a1,a2,a3中的任何一个 公司B:选择b1 公司C:选择a3 公司D:选择a2或a3
谢谢!
答案 0 :(得分:2)
这里不需要循环。 ?max.col
是您在列中查找有效案例并随机选择一个案例的朋友:
tmp <- dat[2:4][cbind(seq_len(nrow(dat)), max.col(is.na(dat[2:4])))]
dat$ownerfinal <- replace(tmp, is.na(tmp), as.character(dat$ownermatch_b1)[is.na(tmp)])
dat
# firmname ownermatch_a1 ownermatch_a2 ownermatch_a3 ownermatch_b1 ownerfinal
#1 Firm A Owner 1 Owner 1 Owner 1 Owner 1 Owner 1
#2 Firm B <NA> <NA> <NA> Owner 2 Owner 2
#3 Firm C <NA> <NA> Owner 3 Owner 3 Owner 3
#4 Firm D <NA> Owner 4 Owner 4 Owner 4 Owner 4
#5 Firm E Owner 5 Owner 5 Owner 5 Owner 5 Owner 5
#6 Firm F Owner 6 Owner 6 Owner 6 Owner 6 Owner 6
如果您想获得第一个有效结果,也可以使用pmax
:
do.call(pmax, c(lapply(dat[2:5],as.character), na.rm=TRUE) )
#[1] "Owner 1" "Owner 2" "Owner 3" "Owner 4" "Owner 5" "Owner 6"
答案 1 :(得分:0)
doLookup <- function(x){
for(i in 2:5){
if(!is.na(x[i]))
return(x[i])
}
return(NA)
}
#loop through each record and make assignment
for(j in 1:nrow(df))
df[j,6] <- doLookup(df[j,])
df
firmname ownermatch_a1 ownermatch_a2 ownermatch_a3 ownermatch_b1 ownerfinal
1 Firm A Owner 1 Owner 1 Owner 1 Owner 1 Owner 1
2 Firm B <NA> <NA> <NA> Owner 2 Owner 2
3 Firm C <NA> <NA> Owner 3 Owner 3 Owner 3
4 Firm D <NA> Owner 4 Owner 4 Owner 4 Owner 4
5 Firm E Owner 5 Owner 5 Owner 5 Owner 5 Owner 5
6 Firm F Owner 6 Owner 6 Owner 6 Owner 6 Owner 6