在列中搜索要合并到r

时间:2018-11-22 14:55:18

标签: r search merge substring

我想从其他数据框中搜索数据框中的一列中的字符串,然后将它们合并在一起。例如:

我有这个数据框:

    location
1   2 high street, ca
2   24 long street, ba,UK
3   1 first avenue, ab
4   15 nant peris , ac
5   1 high street
6   second avenue, ca, UK

然后我要在此数据集上进行匹配:

   id      block
1  1        ab
2  2        ac
3  3        ab
4  5        cb
5  4        ba
6  2        ca

所以我想在“位置”中搜索“块”列中的任何值,然后将列的块和ID合并到第一个数据集上,因此合并后的数据集如下所示:

    location              id     block
1 2 high street, ca       2       ca
2 24 long street, ba,UK   4       ba
3 1 first avenue, ab      1       ab
4 15 nant peris , ac      2       ac
5 1 high street           NA      NA
6 second avenue, ca,UK    2       ca

可复制的代码:

df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))

3 个答案:

答案 0 :(得分:2)

这是使用sqldf软件包进行此操作的一种方法:

library(sqldf)
sql <- "SELECT t1.location, t2.id, t2.block
        FROM df1 t1
        LEFT JOIN df2 t2
            ON t1.location LIKE '%, ' || t2.block OR
               t1.location LIKE '%, ' || t2.block || ',%';
results <- sqldf(sql)

我相信sqldf包可在SQLite上运行,这是使用您的数据运行SQLite演示的链接:

Demo

答案 1 :(得分:0)

使用查找表ltbl(名称向量)的解决方案

ltbl = 1:4  # lookup Table
names(ltbl) = c('ab','ac','ca','ba')

#ab ac ca ba 
# 1  2  3  4

new<-
do.call(
    rbind,
    apply(df2, 1, function(x) {
        ans <- names(ltbl)[stringr::str_detect(x, paste0("\\b", names(ltbl), "\\b"))]
        cbind.data.frame( id = I(ltbl[ans]), block = I(ans) )[1,]
    })
)


cbind(df2, new)

#                  location id block
#ca       2 high street, ca  3    ca
#ba  24 long street, ba, UK  4    ba
#ab      1 first avenue, ab  1    ab
#ac      15 nant peris , ac  2    ac
#NA           1 high street NA  <NA>
#ca1  second avenue, ca, UK  3    ca

将您的长块转换为该查找表:

示例:每个ID只能有一个阻止,蒂姆已经解决了这个问题

myLongCrasyBlock <- data.frame(id = factor(c(1:3,1:3)), block = c('ab','ac','ab','ab','ac','ab'))

myLongCrasyBlock <- unique(myLongCrasyBlock)
ltbl             <- `names<-`(myLongCrasyBlock$id, myLongCrasyBlock$block)

答案 2 :(得分:0)

我试图使用可复制的代码找到不需要特殊软件包的解决方案。

# Creating dataframes
  df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
  df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))

# Make some varaibles as character
  df1$block <- as.character(df1$block)
  df2$location <- as.character(df2$location)

# Create new variable as block        
  df2$block <- "NA"

# Starting the loop
  for (i in 1:length(df1$block)) {

    x <- grep(df1$block[i], df2$location, value = T) #Find location values with the same block value 

    y <- df2[df2$location %in% x,] #Create a new dataframe only with the values found

    rowstokeep <- which(rownames(df2) %in% rownames(y)) # Get the rows of those values

    df2$block[rowstokeep] <- df1$block[i] # Input the block value in the correspond location value
  }

# Merge by "block" variable to get the ID        
       df3 <- merge(df1, df2, by.x = "block", by.y = "block")

我希望这是有用的