Question

如何在搜索列表时匹配完整的字符串/单词。我试过了，但不正确。我在下面给出了sample list，my code和test results

list = ['Hi, hello', 'hi mr 12345', 'welcome sir']

我的代码：

for str in list:
  if s in str:
    print str

测试结果：

s = "hello" ~ expected output: 'Hi, hello' ~ output I get: 'Hi, hello'
s = "123" ~ expected output: *nothing* ~ output I get: 'hi mr 12345'
s = "12345" ~ expected output: 'hi mr 12345' ~ output I get: 'hi mr 12345'
s = "come" ~ expected output: *nothing* ~ output I get: 'welcome sir'
s = "welcome" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'
s = "welcome sir" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'

我的列表包含超过200K的字符串

Answer 1

看起来您不仅需要执行此搜索一次，因此我建议您将列表转换为字典：

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> d = dict()
>>> for item in l:
...     for word in item.split():
...             d.setdefault(word, list()).append(item)
...

所以现在你可以很容易地做到：

>>> d.get('hi')
['hi mr 12345']
>>> d.get('come')    # nothing
>>> d.get('welcome')
['welcome sir']

P.S。可能你需要改进item.split()来处理逗号，点和其他分隔符。也许使用正则表达式和\w。

p.p.s。正如提到的那样，这与“欢迎先生”不符。如果你想匹配整个字符串，它只是建议解决方案的另一条线。但是如果你必须匹配由空格和标点regex限定的字符串的一部分应该是你的选择。

Answer 2

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> search = lambda word: filter(lambda x: word in x.split(),l)
>>> search('123')
[]
>>> search('12345')
['hi mr 12345']
>>> search('hello')
['Hi, hello']

Answer 3

如果您搜索完全匹配：

for str in list:
  if set (s.split()) & set(str.split()):
    print str

Answer 4

只要s只包含几个单词，就可以

s = s.split()
n = len(s)
for x in my_list:
    words = x.split()
    if s in (words[i:i+n] for i in range(len(words) - n + 1)):
        print x

如果s由多个单词组成，那么就会有更高效，但也更复杂的算法。

Answer 5

在此使用正则表达式将精确单词与单词边界匹配\ b

 import re
 .....
 for str in list:
 if re.search(r'\b'+wordToLook+'\b', str):
    print str

\ b仅匹配终止的单词并以单词终结符开头，例如空格或换行

或做类似的事情，以避免一次又一次地输入搜索字词。

import re
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
listOfWords = ['hello', 'Mr', '123']
reg = re.compile(r'(?i)\b(?:%s)\b' % '|'.join(listOfWords))
for str in list:
   if reg.search(str):
      print str

（？i）是搜索而不用担心单词的情况，如果你想用区分大小写搜索然后将其删除。

搜索列表：仅匹配精确的单词/字符串

5 个答案: