Python字符串:整个单词匹配无法正常工作

时间:2020-05-12 10:35:23

标签: python string full-text-search

我的目标是搜索字符串中某些(整个)单词的存在。下面是代码。我不明白为什么我要找到搜索词“ odin”的匹配项,因为这不是字符串中的整个单词。有人可以解释吗?我希望在这种情况下找不到匹配项。

import re
#search words
hero = ['catwoman', 'hellboy', 'eternals', 'elektra', 'hydra', 'iron-man', 'iron man', 'green arrow', 'nightwing', 'flash gordon', 'lanterne verte', 'lantern',
        'kryptonite', 'asgard', 'spider-man', 'spiderman', 'superheroes', 'super heroes', 'super hero', 'hancock', 'daredevil', 'avengers', 'metropolis',
        'gotham', 'batman', 'captain america', 'wolverine', 'magneto', 'dark knight', 'aquaman', 'shazam', 'wolverine', 'punisher', 'batmobile', 
        'daredevil', 'superwoman', 'supergirl', 'wonderwoman', 'batgirl', 'catgirl', 'starfire', 'sandman', 'superman', 'thor', 'x-men', 'x men',
        'marvel', 'spidey', 'superheroine', 'supervillain', 'supervillains', 'odin', 'loki', 'spiderman', 'ragnarok', 'asgardian', 'supergirl', 'spiderman', 
        'teen titans', 'stan lee', 'doctor strange', 'groot', 'ant man', 'ant-man', 'deadpool', 'professor x', 'wasp', 'phoenix', 'star wars',
        'eternals', 'morbius', 'shang-chi', 'shang', 'rocketeer']

#string
s = "Hoping to escape from his troubled past, former DEA agent Phil Broker (Jason Statham) moves to a seemingly quiet backwater town in the bayou with his daughter. However, he finds anything but quiet there, for the town is riddled with drugs and violence. When Gator Bodine (James Franco), a sociopathic druglord, puts the newcomer and his young daughter in harm's way, Broker is forced back into action to save her and their home. Based on a novel by Chuck Logan.^A former DEA agent (Jason Statham) returns to action to save his daughter and his new town from a drug dealing sociopath (James Franco).^A former DEA agent (Jason Statham) encounters trouble when he moves to a small town"

match = re.search(r'\b{}\b'.format('|'.join(hero)),s )

print(match)

输出

<re.Match object; span=(265, 269), match='odin'>

2 个答案:

答案 0 :(得分:0)

re.search非常不准确。它与odin匹配,因为在句子中有:“当Gator B> ODIN <(James F“。
没有正则表达式的简单方法怎么样?

import re
#search words
hero = ['catwoman', 'hellboy', 'eternals', 'elektra', 'hydra', 'iron-man', 
'iron man', 'green arrow', 'nightwing', 'flash gordon', 'lanterne verte', 
'lantern',
    'kryptonite', 'asgard', 'spider-man', 'spiderman', 'superheroes', 'super heroes', 'super hero', 'hancock', 'daredevil', 'avengers', 'metropolis',
    'gotham', 'batman', 'captain america', 'wolverine', 'magneto', 'dark knight', 'aquaman', 'shazam', 'wolverine', 'punisher', 'batmobile', 
    'daredevil', 'superwoman', 'supergirl', 'wonderwoman', 'batgirl', 'catgirl', 'starfire', 'sandman', 'superman', 'thor', 'x-men', 'x men',
    'marvel', 'spidey', 'superheroine', 'supervillain', 'supervillains', 'odin', 'loki', 'spiderman', 'ragnarok', 'asgardian', 'supergirl', 'spiderman', 
    'teen titans', 'stan lee', 'doctor strange', 'groot', 'ant man', 'ant-man', 'deadpool', 'professor x', 'wasp', 'phoenix', 'star wars',
    'eternals', 'morbius', 'shang-chi', 'shang', 'rocketeer']

#string
s = "Hoping to escape from his troubled past, former DEA agent Phil Broker 
(Jason Statham) moves to a seemingly quiet backwater town in the bayou with 
his daughter. However, he finds anything but quiet there, for the town is 
riddled with drugs and violence. When Gator Bodine (James Franco), a 
sociopathic druglord, puts the newcomer and his young daughter in harm's way, 
Broker is forced back into action to save her and their home. Based on a 
novel by Chuck Logan.^A former DEA agent (Jason Statham) returns to action to 
save his daughter and his new town from a drug dealing sociopath (James 
Franco).^A former DEA agent (Jason Statham) encounters trouble when he moves 
to a small town"

split_sentence = s.split(" ")
for word in split_sentence:
if word in hero:
    print("{} is in hero list!".format(word))

答案 1 :(得分:0)

我意识到出了什么问题。搜索模式在“英雄”中没有每个单词的单词边界 我将搜索模式从r'\b{}\b'.format('|'.join(hero))更改为r'\b{}\b'.format(r'\b|\b'.join(hero)),现在它按预期运行。这是完整的代码:

import re
#search words
hero = ['catwoman', 'hellboy', 'eternals', 'elektra', 'hydra', 'iron-man', 'iron man', 'green arrow', 'nightwing', 'flash gordon', 'lanterne verte', 'lantern',
        'kryptonite', 'asgard', 'spider-man', 'spiderman', 'superheroes', 'super heroes', 'super hero', 'hancock', 'daredevil', 'avengers', 'metropolis',
        'gotham', 'batman', 'captain america', 'wolverine', 'magneto', 'dark knight', 'aquaman', 'shazam', 'wolverine', 'punisher', 'batmobile', 
        'daredevil', 'superwoman', 'supergirl', 'wonderwoman', 'batgirl', 'catgirl', 'starfire', 'sandman', 'superman', 'thor', 'x-men', 'x men',
        'marvel', 'spidey', 'superheroine', 'supervillain', 'supervillains', 'odin', 'loki', 'spiderman', 'ragnarok', 'asgardian', 'supergirl', 'spiderman', 
        'teen titans', 'stan lee', 'doctor strange', 'groot', 'ant man', 'ant-man', 'deadpool', 'professor x', 'wasp', 'phoenix', 'star wars',
        'eternals', 'morbius', 'shang-chi', 'shang', 'rocketeer']

#string
s = "Hoping to escape from his troubled past, former DEA agent Phil Broker (Jason Statham) moves to a seemingly quiet backwater town in the bayou with his daughter. However, he finds anything but quiet there, for the town is riddled with drugs and violence. When Gator Bodine (James Franco), a sociopathic druglord, puts the newcomer and his young daughter in harm's way, Broker is forced back into action to save her and their home. Based on a novel by Chuck Logan.^A former DEA agent (Jason Statham) returns to action to save his daughter and his new town from a drug dealing sociopath (James Franco).^A former DEA agent (Jason Statham) encounters trouble when he moves to a small town"

match = re.search(r'\b{}\b'.format(r'\b|\b'.join(hero)),s )

print(match)

输出:

None
相关问题