我想从其他数据框中搜索数据框中的一列中的字符串,然后将它们合并在一起。例如:
我有这个数据框:
location
1 2 high street, ca
2 24 long street, ba,UK
3 1 first avenue, ab
4 15 nant peris , ac
5 1 high street
6 second avenue, ca, UK
然后我要在此数据集上进行匹配:
id block
1 1 ab
2 2 ac
3 3 ab
4 5 cb
5 4 ba
6 2 ca
所以我想在“位置”中搜索“块”列中的任何值,然后将列的块和ID合并到第一个数据集上,因此合并后的数据集如下所示:
location id block
1 2 high street, ca 2 ca
2 24 long street, ba,UK 4 ba
3 1 first avenue, ab 1 ab
4 15 nant peris , ac 2 ac
5 1 high street NA NA
6 second avenue, ca,UK 2 ca
可复制的代码:
df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))
答案 0 :(得分:2)
这是使用sqldf
软件包进行此操作的一种方法:
library(sqldf)
sql <- "SELECT t1.location, t2.id, t2.block
FROM df1 t1
LEFT JOIN df2 t2
ON t1.location LIKE '%, ' || t2.block OR
t1.location LIKE '%, ' || t2.block || ',%';
results <- sqldf(sql)
我相信sqldf
包可在SQLite上运行,这是使用您的数据运行SQLite演示的链接:
答案 1 :(得分:0)
使用查找表ltbl
(名称向量)的解决方案
ltbl = 1:4 # lookup Table
names(ltbl) = c('ab','ac','ca','ba')
#ab ac ca ba
# 1 2 3 4
new<-
do.call(
rbind,
apply(df2, 1, function(x) {
ans <- names(ltbl)[stringr::str_detect(x, paste0("\\b", names(ltbl), "\\b"))]
cbind.data.frame( id = I(ltbl[ans]), block = I(ans) )[1,]
})
)
cbind(df2, new)
# location id block
#ca 2 high street, ca 3 ca
#ba 24 long street, ba, UK 4 ba
#ab 1 first avenue, ab 1 ab
#ac 15 nant peris , ac 2 ac
#NA 1 high street NA <NA>
#ca1 second avenue, ca, UK 3 ca
将您的长块转换为该查找表:
示例:每个ID只能有一个阻止,蒂姆已经解决了这个问题
myLongCrasyBlock <- data.frame(id = factor(c(1:3,1:3)), block = c('ab','ac','ab','ab','ac','ab'))
myLongCrasyBlock <- unique(myLongCrasyBlock)
ltbl <- `names<-`(myLongCrasyBlock$id, myLongCrasyBlock$block)
答案 2 :(得分:0)
我试图使用可复制的代码找到不需要特殊软件包的解决方案。
# Creating dataframes
df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))
# Make some varaibles as character
df1$block <- as.character(df1$block)
df2$location <- as.character(df2$location)
# Create new variable as block
df2$block <- "NA"
# Starting the loop
for (i in 1:length(df1$block)) {
x <- grep(df1$block[i], df2$location, value = T) #Find location values with the same block value
y <- df2[df2$location %in% x,] #Create a new dataframe only with the values found
rowstokeep <- which(rownames(df2) %in% rownames(y)) # Get the rows of those values
df2$block[rowstokeep] <- df1$block[i] # Input the block value in the correspond location value
}
# Merge by "block" variable to get the ID
df3 <- merge(df1, df2, by.x = "block", by.y = "block")
我希望这是有用的