我已经尝试了很多正则表达式来查找只包含等于或少于两个单词的所有句子,这个单词应该是这样的: 嗨!或者这个或(我的名字)或(!见)或所有英文字符+符号的任意组合,如?:!#,@或数字:
我试过了:
(\n|\r)\s*\w+[^\w]*\w*[^\w]*\w*[^\w]*(\n|$)+
和
\n\s*\w+
并且^(\ S + \ s?)也不起作用。
很多人 但我无法得到正确的结果 http://prntscr.com/84db2a答案 0 :(得分:0)
那个具有overlapped=True
功能对于下面的正则表达式代码至关重要。它也匹配第一个句子(如果它只有两个单词)。您必须再次使用上面链接的regex
库 - 它具有内置re
模块提供的几乎相同的功能。
import regex
data = ("This sentence has a few words. This too. Hello world. This has four "
"words. This doesn't. This one has five words.")
found = regex.findall(r"^([^\s]+\s*[^\s]+)\s*\.|\.\s*([^\s]+\s+[^\s]+)\s*\.",
data, overlapped=True)
for group in found:
for sentence in filter(None, group):
print(sentence)
上面的代码也可以在Python的内置re
模块中使用,但如果两个相邻的句子碰巧由两个单词组成,则只会匹配其中一个。
以下是regex101.com的代码细分:
1st Alternative: ^([^\s]+\s*[^\s]+)\s*\.
^ assert position at start of the string
1st Capturing group ([^\s]+\s*[^\s]+)
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\. matches the character . literally
2nd Alternative: \.\s*([^\s]+\s+[^\s]+)\s*\.
\. matches the character . literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Capturing group ([^\s]+\s+[^\s]+)
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\. matches the character . literally
答案 1 :(得分:-1)
你必须使用正则表达式吗?我认为你可以通过使用:
来实现这一目标sentence = 'This is a sentence'
words = sentence.split()
if len(words)>2:
# Do something
else:
#Do something else
如果你有一个段落想要找到句子你可以用句子= paragraph.split('。')将它分成几个句子,然后循环查找超过2个单词的句子。