Python提取包含特殊单词列表的句子

时间:2019-06-18 11:19:39

标签: regex python-3.x text-segmentation

我正在尝试从包含单词列表的大型文本集中提取句子。

例如搜索“ noodl”,“ vege”和“ meat”。

str1 = "My new noodles are great\n vegetables. Not \nthis noodle sentence though.\n Nor this vege sentences."
results = re.findall(regex, str1)

应该返回“我的新面条很棒\ n蔬菜。”作为唯一匹配项。

从(Python extracting sentence containing 2 words)起,我想出了以下正则表达式:

regex = re.compile(
            r"""
            ([^.]*?# Starting with anything but .
                 (# Capture group start
                    (noodl|vege|meat)# Countains these words
                    [^.]*#with anything but . in between
                 ){2,}# At least 2 times
                [^.]*\.# Followed by anything but '.' followed by '.'
                )
                        """,
            re.MULTILINE | re.IGNORECASE | re.VERBOSE)

但这会导致

for x in results:
    print(x)
#My new noodles are great\n vegetables.
#vegetables
#vege

这是意外的。应该如何更改我的正则表达式以仅匹配整个句子?找到的句子将得到进一步处理。处理的自然语言不是英语,但当前结果与演示语句相同。

0 个答案:

没有答案