我正在编写一个程序,我需要测试一个短语,例如" red car"存在于各种句子中:"我买了一辆新的红色汽车"," RED! CAR!","红色#$%^ car"。
我无法找到从上一个示例中的符号中分离单词的方法。
到目前为止我的代码是:
exclude = set(string.punctuation)
text = text.lower
text = ''.join(ch for ch in text if ch not in exclude)
text = text.split()
for word in phrase:
found = False
for e2 in text:
if word == e2:
found = True
break
if not found:
return False
return True
这导致最后一个例子是“redcar”'所以这些话并没有分开。
我在这里发现的所有问题都是在讨论分隔符,而不是分割出两个由一堆符号连接起来的单词。
我应该只为每个符号使用text.split吗?
我想的是:
for ch in exclude:
text = text.split(ch)
但我希望有一种更清洁的方法。
答案 0 :(得分:4)
这个问题几乎是为正则表达式量身定制的,例如:
import re
red_car = re.compile(r"\bred\W{1,5}car\b", re.I)
if red_car.search("I bought a red#$%^car yesterday"):
print("found a red car")
正则表达式的重要组成部分是:
\b matches a word boundary at start and end so as not to match "tired carrot"
\W matches any non-word character between "red" and "car"
{1,5} matches from one to five occurrences of \W between "red" and "car"
re.I makes the regex ignore case (match "RED car" etc.)
答案 1 :(得分:0)
您可以迭代句子,看看句子中是否存在red
和car
:
sentences = ["I bought a new red car", "RED! CAR!", "red#$%^car"]
final_sentences = [sentence for sentence in sentences if "red" in sentence.lower() and "car" in sentence.lower()]