这是我整天早上一直让我发疯的问题。
所以,我有两个表“船只”和“目标”
v_registry<-c("","GBR000B11824","GBR000B10110","GBR000C17779","","GBR000C16255")
v_pln<-c("WH4","","BRD5","B291","LI8","UL78")
v_rss<-c("C19926","B11824","","C17779","A16190","C16255")
v_asset<- c(104892,104902,104905,104916,104919,104920)
vessel<-data.frame(v_registry,v_pln,v_rss,v_asset,stringsAsFactors=FALSE)
t_registry<-c("GBR000C19926","GBR000B11824","","","GBR000A16190","")
t_pln<-c("","","BRD5","B291","LI8","")
t_rss<-c("C19926","","","","","C16255")
target<-data.frame(t_registry,t_pln,t_rss,stringsAsFactors=FALSE)
target<-target[sample(nrow(target)),]
船只表有关于船只的身份证明信息。目标表非常广泛,示例中不需要大量其他数据。我想要实现的是将“t_asset”列(这是唯一的完整ID字段)复制到目标表。问题是我的表都没有完成,我需要根据三个不同的字段进行操作。
以下是尝试这样做的几次尝试。样品线只是为了洗牌它,因为如果它被订购有一些奇怪的原因它会起作用。第二次尝试只返回一个逻辑值,我没有设法获取元素而不是逻辑值。
#Attempt 1
target$t_asset<-
vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")|
match(target$t_pln,vessel$v_pln,incomparables = "")|
match(target$t_rss,vessel$v_rss,incomparables = "")]
#Attempt 2
target$t_asset<-
(vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")]|
vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]|
vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")])
预期的输出是(由于shuffle,行可能看起来不同):
> target
t_registry t_pln t_rss t_asset
1 GBR000C19926 C19926 104892
2 GBR000B11824 104902
3 BRD5 104905
4 B291 104916
5 GBR000A16190 LI8 104919
6 C16255 104920
关于如何解决它的任何想法?
干杯
答案 0 :(得分:1)
# Find which rows from vessel are the match for target
x <- mapply( match , MoreArgs=list(incomparables="") , target , vessel )
# Remove the NA's and incase you have more than one piece of information
# available (multiple matches), reduce to a single number
idx <- apply(x,1, function(x) unique( x[!is.na(x) ] ))
# Use the matches to get the id field from vessel
target$t_asset <- vessel$v_asset[idx]
target
# t_registry t_pln t_rss t_asset
#3 BRD5 104905
#2 GBR000B11824 104902
#4 B291 104916
#1 GBR000C19926 C19926 104892
#6 C16255 104920
#5 GBR000A16190 LI8 104919
答案 1 :(得分:1)
使用merge
:
target$t_asset <- merge(target, vessel, by.x=1:3, by.y=1:3, all.y = T, sort = F)$v_asset
> target
t_registry t_pln t_rss t_asset
6 C16255 104892
1 GBR000C19926 C19926 104902
3 BRD5 104905
2 GBR000B11824 104916
5 GBR000A16190 LI8 104919
4 B291 104920
答案 2 :(得分:0)
两者,早先的答案解决了给出的例子。但是,当出于某种原因应用于实际数据集时,两者都会出错。
所以,最后我还得到了一些代码,以便在真实数据集中提供正确答案并进行测试。但是,代码并不漂亮,我确信它可以提高效率。
# Creates three new columns each with an idependent match
target$t_asset_registry<-vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")]
target$t_asset_pln<-vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]
target$t_asset_rss<-vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")]
# an if statment to sumarize the results
target$asset<-ifelse(is.na(target$t_asset_registry),
ifelse(is.na(target$t_asset_pln),
ifelse(is.na(target$t_asset_rss),NA,target$t_asset_rss),
target$t_asset_pln),target$t_asset_registry)
输出结果为:
> target
t_registry t_pln t_rss t_asset_registry t_asset_pln t_asset_rss asset
4 B291 NA 104916 NA 104916
3 BRD5 NA 104905 NA 104905
6 C16255 NA NA 104920 104920
5 GBR000A16190 LI8 NA 104919 NA 104919
1 GBR000C19926 C19926 NA NA 104892 104892
2 GBR000B11824 104902 NA NA 104902
在输出中看清楚我想要实现的目标。如果有人作为完成相同结果的聪明方法,请发布。
感谢所有帮助