如何找到两个字符串中的单词匹配?

时间:2018-12-06 16:45:14

标签: python arrays string list overlap

我有两个字符串

My name is BogdanBogdan and I am from Russia

我需要从该字符串中获取单词Bogdan。我总是知道第一句话的结尾==第二句话的开头。

我怎么能发现重叠呢?

我的解决方案返回相似的字符

res = list(set('My name is Bogdan').intersection(set('Bogdan and i am from Russia')))
print(res)

返回

['i', 'n', 'g', 'm', ' ', 's', 'B', 'a', 'd', 'o']

3 个答案:

答案 0 :(得分:3)

首先要最大程度地重叠两个字符串,然后通过减少重叠来进行迭代:

def find_overlap(s1, s2):
    for i in range(len(s1)):
        test1, test2 = s1[i:], s2[:len(s1) - i]
        if test1 == test2:
            return test1

s1, s2 = "My name is Bogdan", "Bogdan and I am from Russia"
find_overlap(s1, s2)
# 'Bogdan'
s1, s2 = "mynameisbogdan", "bogdanand"
find_overlap(s1, s2)
# 'bogdan'

如您所见,如果两个字符串不包含空格,这也将起作用。

这具有O(n)运行时,但是如果您首先确定两个字符串中的哪一个较短,则可以将其减少为O(min(n,m))。

如果您希望找到的字符串比两个字符串中最短的字符串短得多,则可以使它等于O(k),其中k是从最小重叠开始的字符串长度:

def find_overlap(s1, s2):
    for i in range(1, len(s1) + 1):
        if i == len(s2):
            return None
        test1, test2 = s1[-i:], s2[:i]
        if test1 == test2:
            return test1

答案 1 :(得分:0)

可以使用设置交集

l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
print(set(l1.split())&set(l2.split())) # set('Bogdan')

列表理解

l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
[i for i in l1.split() if i in l2.split()] ['Bogdan']

答案 2 :(得分:0)

其他选项,带有for循环:

def shared_words(s1, s2):
  res = []
  l_s1, l_s2 = set(s1.split()), set(s2.split())
  for ss1 in l_s1:
    if ss1 in l_s2: res.append(ss1)
  return res

应用于字符串:

s1 = "My name is Bogdan"
s2 = "Bogdan and I am from Russia"
print(shared_words(s1, s2)) #=> ['Bogdan']

或者,使用正则表达式仅分割单词:

import re

def shared_words(s1, s2):
  res = []
  l_s1, l_s2 = set(re.findall(r'\w+',s1)), set(re.findall(r'\w+',s2))
  for ss1 in l_s1:
    if ss1 in l_s2: res.append(ss1)
  return res

获得:

s1 = "My name is Bogdan, I am here"
s2 = "Bogdan and I am from Russia."
print(shared_words(s1, s2)) #=> ['Bogdan', 'I', 'am']