使用正则表达式查找所有具有小于或等于2的单词的句子

时间:2015-08-13 22:28:11

标签: python regex

我已经尝试了很多正则表达式来查找只包含等于或少于两个单词的所有句子,这个单词应该是这样的: 嗨!或者这个或(我的名字)或(!见)或所有英文字符+符号的任意组合,如?:!#,@或数字:

我试过了:

(\n|\r)\s*\w+[^\w]*\w*[^\w]*\w*[^\w]*(\n|$)+

\n\s*\w+ 

并且^(\ S + \ s?)也不起作用。

很多人 但我无法得到正确的结果 http://prntscr.com/84db2a

2 个答案:

答案 0 :(得分:0)

如果您正在使用this version of regex模块,则以下代码将有效。

那个具有overlapped=True功能对于下面的正则表达式代码至关重要。它也匹配第一个句子(如果它只有两个单词)。您必须再次使用上面链接的regex库 - 它具有内置re模块提供的几乎相同的功能。

import regex


data = ("This sentence has a few words. This too. Hello world. This has four "
        "words. This doesn't. This one has five words.")
found = regex.findall(r"^([^\s]+\s*[^\s]+)\s*\.|\.\s*([^\s]+\s+[^\s]+)\s*\.", 
                      data, overlapped=True)

for group in found:
    for sentence in filter(None, group):
        print(sentence)

上面的代码也可以在Python的内置re模块中使用,但如果两个相邻的句子碰巧由两个单词组成,则只会匹配其中一个。

以下是regex101.com的代码细分:

1st Alternative: ^([^\s]+\s*[^\s]+)\s*\.
    ^ assert position at start of the string
    1st Capturing group ([^\s]+\s*[^\s]+)
        [^\s]+ match a single character not present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \s match any white space character [\r\n\t\f ]
        \s* match any white space character [\r\n\t\f ]
            Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        [^\s]+ match a single character not present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \s match any white space character [\r\n\t\f ]
    \s* match any white space character [\r\n\t\f ]
        Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
    \. matches the character . literally
2nd Alternative: \.\s*([^\s]+\s+[^\s]+)\s*\.
    \. matches the character . literally
    \s* match any white space character [\r\n\t\f ]
        Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
    2nd Capturing group ([^\s]+\s+[^\s]+)
        [^\s]+ match a single character not present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \s match any white space character [\r\n\t\f ]
        \s+ match any white space character [\r\n\t\f ]
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        [^\s]+ match a single character not present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \s match any white space character [\r\n\t\f ]
    \s* match any white space character [\r\n\t\f ]
        Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
    \. matches the character . literally

答案 1 :(得分:-1)

你必须使用正则表达式吗?我认为你可以通过使用:

来实现这一目标
sentence = 'This is a sentence'
words = sentence.split()
if len(words)>2: 
    # Do something
else:
    #Do something else

如果你有一个段落想要找到句子你可以用句子= paragraph.split('。')将它分成几个句子,然后循环查找超过2个单词的句子。