通过沿列搜索来匹配两列;两个列在同一数据框中

时间:2019-03-01 21:04:02

标签: r

我正在雇用RStudio。我有以下玩具模型:

entity.module.ts

我想创建一个名为import { NgModule, CUSTOM_ELEMENTS_SCHEMA } from '@angular/core'; import { RouterModule } from '@angular/router'; @NgModule({ imports: [ RouterModule.forChild([ { path: 'contact-info', loadChildren: './burocracy/contact-info/contact-info.module#BurocracyContactInfoModule' }, { path: 'phone', loadChildren: './burocracy/phone/phone.module#BurocracyPhoneModule' } /* jhipster-needle-add-entity-route - JHipster will add entity modules routes here */ ]) ], declarations: [], entryComponents: [], providers: [], schemas: [CUSTOM_ELEMENTS_SCHEMA] }) export class FrontendEntityModule {} 的第三列,其中@user = User.new @user.first_name = "john" @user.save info = @user.info info.address = "Some address" info.save @user.reload puts @user.info.address 中的每个条目都搜索整个df <- data.frame("Name1" = c("JPMorgan", "BMO", "Citibank", "Barclays", "Deutsche", "Chase", "HSBC", ".", ".", ".", ".", ".", ".", ".", ".", ".", ".", ".", ".","."), "Name2" = c("JPMorgan and Chase","SEFCU Union","Wells Fargo Commercial Bank","Bank of America", "Citibank LLC","Charles Schwab", "Barclays", "HSBC Holdings PLc", "Wall Bank Holdings", "Chase Manhattan Bank", "TD Bank", "Ally Bank", "Goldman Sachs", "M&T Bank", "Key Bank", "Royal Bank of Canada", "Bank of Montreal BMO", "US Bancorp", "Capital One", "BNY Mellon"), stringsAsFactors = FALSE) 列,如果字符串部分存在于{中,则输出为1。 {1}},如果不存在则为0。

我目前使用逐行变异的方法是产生1:1匹配。

我想要的输出将是带有以下内容的新列: 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 个答案:

答案 0 :(得分:3)

根据您的评论,我假设您要计算Name1中包含的Name2中的所有完整字符串。然后,您可以使用pmatch()进行部分字符串匹配,并使用as.logical()将结果转换为逻辑。如果您希望使用01而不是FALSETRUE,只需添加另一个as.numeric():

df$matched <- as.numeric(as.logical(pmatch(df$Name1, df$Name2, nomatch = 0, duplicates.ok = TRUE)))

答案 1 :(得分:2)

使用stringr::str_detect的解决方案。与pmatch解决方案相比,它为BMO提供了不同的答案。

library("dplyr")
library("stringr")

has_match <- function(name, candidates) {
  if (name == ".")
    FALSE
  else
    any(str_detect(candidates, name))
}

df <- df %>% # Add the new columns. Although first you should probably decide on
             # which partial matching algorithm you want to use.
  mutate(match = sapply(Name1, has_match, Name2)) %>%
  mutate(match2 = pmatch(Name1, Name2, nomatch = 0, duplicates.ok = TRUE) > 0)
df
#       Name1                       Name2 match match2
# 1  JPMorgan          JPMorgan and Chase  TRUE   TRUE
# 2       BMO                 SEFCU Union  TRUE  FALSE
# 3  Citibank Wells Fargo Commercial Bank  TRUE   TRUE
# 4  Barclays             Bank of America  TRUE   TRUE
# 5  Deutsche                Citibank LLC FALSE  FALSE
# 6     Chase              Charles Schwab  TRUE   TRUE
# 7      HSBC                    Barclays  TRUE   TRUE

BMO的区别在于它出现在“蒙特利尔BMO银行”中-不在全名字符串的开头。在所有其他情况下,匹配都是从头开始的。