我有这样的示例文本,
## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.
我想使用正则表达式而不是str.split
将其拆分为两个段落,所以我尝试了。
In [18]: para = re.findall(r'## .+', content)
In [19]: para
Out[19]: ['## Paragraph 1', '## Paragraph 2']
我意图的输出是分开的完整段落。
['## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n',
'## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']
如何完成它?
答案 0 :(得分:1)
你可以试试这个:
import re
s = " ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n\n## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug."
paragraphs = re.split('\n(?=## Paragraph \d+)', s)
输出:
[' ## Paragraph 1\n\nThe [`sys`](https://docs.python.org/3.6/library/sys.html#module-sys) module also has attributes for *stdin*, *stdout*, and *stderr*. \n\nThe latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected:\n',
'## Paragraph 2\n\nThe [`re`](https://docs.python.org/3.6/library/re.html#module-re) module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions:\n\nWhen only simple capabilities are needed, string methods are preferred because they are easier to read and debug.']
答案 1 :(得分:0)
您可以尝试内置拆分功能
string = '''I am new to python # please help me '''
data = string.split('#')
print(data)
<强>输出强>
[&#39;我是python&#39;的新手,&#39;请帮帮我&#39;]