按段拆分文字

时间:2017-12-16 15:02:09

标签: python regex

我有这样的示例文本,

 ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.

我想使用正则表达式而不是str.split将其拆分为两个段落,所以我尝试了。

In [18]: para = re.findall(r'## .+', content)
In [19]: para
Out[19]: ['## Paragraph 1', '## Paragraph 2']

我意图的输出是分开的完整段落。

['## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n',
'## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

如何完成它?

2 个答案:

答案 0 :(得分:1)

你可以试试这个:

import re
s = " ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug."
paragraphs = re.split('\n(?=## Paragraph \d+)', s)

输出:

 [' ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n', 
 '## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']

答案 1 :(得分:0)

您可以尝试内置拆分功能

string = '''I am new to python # please help me '''
data = string.split('#')
print(data)

<强>输出

  
    

[&#39;我是python&#39;的新手,&#39;请帮帮我&#39;]