我正在尝试在 python 中匹配字符串:
例如,如果我的短语是 "long string"
:
我想匹配 "long string", "Long StrInG", "long!!!string"
但不想匹配 "Long strings"
或 "stringlong"
。即)我想在任何文本字符串中匹配我的字符串的所有实例,不考虑大小写,不捕获子字符串。
ie) when I do
string = "hello"
strings = "hellos"
string in strings == True but I don't want this to be true
我还希望字符串能够捕获由空格或标点符号分隔的长句中的任何实例:
ie) string = "long string" should match
"hello ~~~!!!!! long !@#!@#!@ string"
Whitespace also matters - I don't want to match
string = "longstringlongstring" or "longs trying"
这是我迄今为止尝试过的:
text = text where we are seeing if it contains instance of string
phrase = string to look for in text
cleaned_text = ""
for char in text:
if char in string.punctuation:
char = " "
cleaned_text += char
else:
cleaned_text += char.lower()
cleaned_string = " ".join(cleaned_text.split())
counter = 0
for char in cleaned_string:
for char2 in phrase:
if char == char2:
counter += 1
if counter == len(phrase):
return True
return False
我意识到我不能使用列表,因为顺序无关紧要。非常感谢一些建议!
答案 0 :(得分:1)
使用正则表达式:
import re
import string
# given phrase
phrase = "long string"
# this says what can go between two words of the phrase above
between = "[" + r"\s" + re.escape(string.punctuation) + "]+"
# the pattern
pat = r"\b" + between.join(phrase.split()) + r"\b"
reg = re.compile(pat, flags=re.I)
其中 between
由空格 (\s
) 和来自 string.punctuation
的所有标点字符组成,至少可以看到一次(由于 []+
围绕它) .我们 re.escape
它,因为它包含正则表达式元字符,但我们需要在那里进行文字匹配(例如,$
)。然后 pat
tern 形成为 join
用这个 between
对短语的单词进行连接,最后在两端放置单词边界 (\b
) 以确保精确匹配,例如,阻止 long stringS
匹配。 re.I
在编译正则表达式时说忽略大小写。
对于这个短语,pat
看起来像
\blong[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+string\b
如果要输入一个词 phrase
,例如 phrase = "this"
,则
\bthis\b
即,之间没有标点符号和空格,因为只有一个词。
最后,对于一个 3 字的 phrase
,例如,phrase = "no escape needed"
\bno[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+escape[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+needed\b
即,它动态地形成正则表达式。
用于测试的示例运行(如果是 is not None
,则匹配):
>>> re.search(reg, "long string") is not None
True
>>> re.search(reg, "Long StrInG") is not None
True
>>> re.search(reg, "long!!!string") is not None
True
>>> re.search(reg, "Long strings") is not None
False
>>> re.search(reg, "stringlong") is not None
False
>>> re.search(reg, "hello ~~~!!!!! long !@#!@#!@ string") is not None
True
>>> re.search(reg, "longstring") is not None
False
您可以参考正则表达式详情here。