我试图拆分以下分隔符:句号,分号,*,+ ,?而且 - 但是,我只想在句子开头出现时将“ - ”分开(以便不分割“非功能性”等词语
我尝试了以下但我没有取得任何进展,任何帮助将不胜感激:
sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)
这是我一直在尝试的示例文本:
- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon
* See this case mis-alignment
拆分后的预期输出是项目列表:
TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment
答案 0 :(得分:1)
尝试枚举这样的分隔符:
re.split(“[。; * +?]”)
答案 1 :(得分:1)
如果你想在一组定义的分隔符中拆分字符串而不是这样:
>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']
如果您不希望在结果列表中输入分隔符:
>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']
编辑:在回复您的以下评论时,请使用\s
作为空格:
>>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt)
>>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n stability', 'New icon']
答案 2 :(得分:1)
您可以使用此re.split
功能。
>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']