Question

我正在编写一个程序，我需要测试一个短语，例如＆＃34; red car＆＃34;存在于各种句子中：＆＃34;我买了一辆新的红色汽车＆＃34;，＆＃34; RED！ CAR！＆＃34;，＆＃34;红色＃$％^ car＆＃34;。

我无法找到从上一个示例中的符号中分离单词的方法。

到目前为止我的代码是：

exclude = set(string.punctuation)

text = text.lower
text = ''.join(ch for ch in text if ch not in exclude)
text = text.split()

for word in phrase:
    found = False
    for e2 in text:
        if word == e2:
           found = True
           break
    if not found:
       return False
return True

这导致最后一个例子是“redcar”＆＃39;所以这些话并没有分开。

我在这里发现的所有问题都是在讨论分隔符，而不是分割出两个由一堆符号连接起来的单词。

我应该只为每个符号使用text.split吗？

我想的是：

for ch in exclude:
    text = text.split(ch)

但我希望有一种更清洁的方法。

Answer 1

这个问题几乎是为正则表达式量身定制的，例如：

import re
red_car = re.compile(r"\bred\W{1,5}car\b", re.I)

if red_car.search("I bought a red#$%^car yesterday"):
    print("found a red car")

正则表达式的重要组成部分是：

\b     matches a word boundary at start and end so as not to match "tired carrot"
\W     matches any non-word character between "red" and "car"
{1,5}  matches from one to five occurrences of \W between "red" and "car"
re.I   makes the regex ignore case (match "RED car" etc.)

Answer 2

您可以迭代句子，看看句子中是否存在red和car：

sentences = ["I bought a new red car", "RED! CAR!", "red#$%^car"]
final_sentences = [sentence for sentence in sentences if "red" in sentence.lower() and "car" in sentence.lower()]

Python将短语拆分为空格和符号上的单词

2 个答案: