使用Python REGEX查找问题短语

时间:2018-10-10 22:24:14

标签: regex python-3.x

我正在尝试使用python regex查找每个问题短语,所以基本上我需要找到一个 初始响应 ,并检测其中的所有内容直到 问号 为止,避免在中间出现其他对话。

所以我附带了代码:

questionRegex = re.compile(r'[?.!][A-Za-z\s]*\?')

然后我使用此正则表达式在文本中查找问题:

text = '''
Maybe the barista’s looking at me because she thinks I’m attractive. I am in my blue shirt. So she has stringy hair? Who am I to complain about stringy hair? Who do I think I am? Cary Grant?

And now John was doing temp work at the law firm of Fleurstein and Kaplowitz to get himself righted again. He had a strong six-month plan: he would save some money to pay Rebecca’s parents back for the house and be able to take some time off to focus on his writing—on his painting. In a few months, he would be back on his feet, probably even engaged to someone new. Maybe even that barista. Yes, almost paradoxically, temp work provided John with the stability he craved.

This is shit. It is utter shit. What are you talking about? Are you serious about this?
'''

像这样:

process = questionRegex.findall(text)

但是我得到的结果是:

  • 。所以她的头发很长?

  • ?我以为我是谁?

  • 。你在说什么?

问题是本文中有5个问题。表示此正则表达式无法捕获问题:

  • 我该为谁抱怨头发粗呢?
  • 您对此很认真吗?

我的代码有什么问题,为什么它不能像其他问题一样抓住这两个问题?

2 个答案:

答案 0 :(得分:1)

我弄清楚了为什么您的正则表达式模式无法返回所有结果。

以下字符串:

  • 我该为谁抱怨头发粗呢?
  • 您对此很认真吗?

实际上,下一个要问的语句都在空格字符之后。

因此,您无需指定一组[?.!],而只需使用\s

模式变为:

In [20]: pattern = re.compile(r'\s[A-Za-z\s]*\?')

In [21]: pattern.findall(text)
Out[21]:
[' So she has stringy hair?',
 ' Who am I to complain about stringy hair?',
 ' Who do I think I am?',
 ' Cary Grant?',
 ' What are you talking about?',
 ' Are you serious about this?']

答案 1 :(得分:0)

您可以尝试以下方法:

(?<=[\?\.\!]\s)[^\?\n\.]+?\?

比赛:

  

所以头发很长?

     

我该为谁抱怨头发粗呢?

     

我认为我是谁?

     

加里助学金?

     

你在说什么?

     

您对此认真吗?