Python将短语拆分为空格和符号上的单词

时间:2017-12-05 22:29:21

标签: python string split

我正在编写一个程序,我需要测试一个短语,例如" red car"存在于各种句子中:"我买了一辆新的红色汽车"," RED! CAR!","红色#$%^ car"。

我无法找到从上一个示例中的符号中分离单词的方法。

到目前为止我的代码是:

exclude = set(string.punctuation)

text = text.lower
text = ''.join(ch for ch in text if ch not in exclude)
text = text.split()

for word in phrase:
    found = False
    for e2 in text:
        if word == e2:
           found = True
           break
    if not found:
       return False
return True

这导致最后一个例子是“redcar”'所以这些话并没有分开。

我在这里发现的所有问题都是在讨论分隔符,而不是分割出两个由一堆符号连接起来的单词。

我应该只为每个符号使用text.split吗?

我想的是:

for ch in exclude:
    text = text.split(ch)

但我希望有一种更清洁的方法。

2 个答案:

答案 0 :(得分:4)

这个问题几乎是为正则表达式量身定制的,例如:

import re
red_car = re.compile(r"\bred\W{1,5}car\b", re.I)

if red_car.search("I bought a red#$%^car yesterday"):
    print("found a red car")

正则表达式的重要组成部分是:

\b     matches a word boundary at start and end so as not to match "tired carrot"
\W     matches any non-word character between "red" and "car"
{1,5}  matches from one to five occurrences of \W between "red" and "car"
re.I   makes the regex ignore case (match "RED car" etc.)

答案 1 :(得分:0)

您可以迭代句子,看看句子中是否存在redcar

sentences = ["I bought a new red car", "RED! CAR!", "red#$%^car"]
final_sentences = [sentence for sentence in sentences if "red" in sentence.lower() and "car" in sentence.lower()]