正则表达式多行字符串?

时间:2013-05-03 17:31:54

标签: python regex

我有以下输入:

str = """

    Q: What is a good way of achieving this?

    A: I am not sure. Try the following:

    1. Take this first step. Execute everything.

    2. Then, do the second step

    3. And finally, do the last one



    Q: What is another way of achieving this?

    A: I am not sure. Try the following alternatives:

    1. Take this first step from before. Execute everything.

    2. Then, don't do the second step

    3. Do the last one and then execute the above step

"""

我想在输入中捕获QA对,但我无法获得良好的正则表达式来执行此操作。我管理了以下内容:

(?ms)^[\s#\-\*]*(?:Q)\s*:\s*(\S.*?\?)[\s#\-\*]+(?:A)\s*:\s*(\S.*)$

但是,我能够按如下方式捕获输入:

('Q', 'What is a good way of achieving this?')
('A', "I am not sure. Try the following:\n    1. Take this first step. Execute everything.\n    2. Then, do the second step\n    3. And finally, do the last one\n\n    Q: What is another way of achieving this?\n    A: I am not sure. Try the following alternatives:\n    1. Take this first step from before. Execute everything.\n    2. Then, don't do the second step\n    3. Do the last one and then execute the above step\n")

注意第一个如何捕获第二个QA对。如果我在答案正则表达式的末尾使用贪婪的?,它不会捕获枚举。关于如何解决这个问题的任何建议?

4 个答案:

答案 0 :(得分:1)

解决它的懒惰但不是最好的方法是用“Q:”爆炸字符串,然后用简单的/ Q :(。+)A :(。+)/ msU(在一般正则表达式。

答案 1 :(得分:1)

使用它对我来说很好。只需要修剪一些空格。

(?s)(Q):((?:(?!A:).)*)(A):((?:(?!Q:).)*)

使用示例:

>>> import re
>>> str = """
...
...     Q: What is a good way of achieving this?
...
...     A: I am not sure. Try the following:
...
...     1. Take this first step. Execute everything.
...
...     2. Then, do the second step
...
...     3. And finally, do the last one  ...      ...   ...
...     Q: What is another way of achieving this?
...
...     A: I am not sure. Try the following alternatives:
...
...     1. Take this first step from before. Execute everything.
...
...     2. Then, don't do the second step
...
...     3. Do the last one and then execute the above step
...
... """
>>> regex = r"(?s)(Q):((?:(?!A:).)*)(A):((?:(?!Q:).)*)"
>>> match = re.findall(regex, str)
>>> map(lambda x: [part.strip().replace('\n', '') for part in x], match)
[['Q', 'What is a good way of achieving this?', 'A', 'I am not sure. Try the following:    1. Take this first step. Execute everything.    2. Then, do the second step    3. And finally, do the last one'], ['Q', 'What is another way of achieving this?', 'A', "I am not sure. Try the following alternatives:    1. Take this first step from before. Execute everything.    2. Then, don't do the second step    3. Do the last one and then execute the above step"]]

甚至添加了一些东西来帮助你清理那里的空白。

答案 2 :(得分:0)

我编写巨大的正则表达式并不是那么聪明,所以这是我的非正则表达式解决方案 -

>>> str = """

    Q: What is a good way of achieving this?

    A: I am not sure. Try the following:

    1. Take this first step. Execute everything.

    2. Then, do the second step

    3. And finally, do the last one



    Q: What is another way of achieving this?

    A: I am not sure. Try the following alternatives:

    1. Take this first step from before. Execute everything.

    2. Then, don't do the second step

    3. Do the last one and then execute the above step

"""
>>> qas = str.strip().split('Q:')
>>> clean_qas = map(lambda x: x.strip().split('A:'), filter(None, qas))
>>> print clean_qas
[['What is a good way of achieving this?\n\n    ', ' I am not sure. Try the following:\n\n    1. Take this first step. Execute everything.\n\n    2. Then, d
o the second step\n\n    3. And finally, do the last one'], ['What is another way of achieving this?\n\n    ', " I am not sure. Try the following alternativ
es:\n\n    1. Take this first step from before. Execute everything.\n\n    2. Then, don't do the second step\n\n    3. Do the last one and then execute the
above step"]]

你应该清理空白。或者你可以做Puciek所说的。

只是为了好玩 -

>>> clean_qas = map(lambda x: map(lambda s: s.strip(), x.strip().split('A:')), filter(None, qas))
>>> print clean_qas
[['What is a good way of achieving this?', 'I am not sure. Try the following:\n\n    1. Take this first step. Execute everything.\n\n    2. Then, do the sec
ond step\n\n    3. And finally, do the last one'], ['What is another way of achieving this?', "I am not sure. Try the following alternatives:\n\n    1. Take
 this first step from before. Execute everything.\n\n    2. Then, don't do the second step\n\n    3. Do the last one and then execute the above step"]]

看起来很丑陋。

答案 3 :(得分:0)

略微修改原始解决方案:

(?ms)^[\s#\-\*]*(?:Q)\s*:\s+(\S[^\n\r]*\?)[\s#\-\*]+(?:A)\s*:\s+(\S.*?)\s*(?=$|Q\s*:\s+)
  • 问题和答案必须在:之后至少有一个空格。
  • 不要非贪婪地匹配问题(在一个问题中不允许多个?),不要在问题中使用换行符。
  • 不是匹配到字符串的结尾,而是非贪婪地匹配,直到 匹配后跟字符串的结尾,然后是另一个问题。

使用re.findall获取所有问题/答案匹配。