我在python中使用RE表达式并试图按句点和感叹号分割一段文本。但是当我拆分它时,我在结果中得到“无”
a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
这是我的代码:
re.split('((?<=\w)\.(?!\..))|(!)',a)
请注意,我有这个(?&lt; = \ w)。(?!..),因为我希望它避免使用省略号。不过,上面的代码吐出来了:
['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']
正如您所看到的,在句号或感叹号所在的位置,它在我的列表中添加了一个特殊的“无”。为什么这样,我该如何摆脱它?
答案 0 :(得分:9)
尝试以下方法:
re.split(r'((?<=\w)\.(?!\..)|!)', a)
您获得None
,因为您有两个捕获组,并且所有组都包含在re.split()
结果中。
因此,只要您匹配.
第二个捕获组None
,只要您匹配!
,第一个捕获组就是None
。
结果如下:
['This is my text...I want it to split by periods',
'.',
' I also want it to split by exclamation marks',
'!',
' Is that so much to ask?']
如果您不想在结果中加入'.'
和'!'
,只需删除围绕整个表达式的括号:r'(?<=\w)\.(?!\..)|!'
答案 1 :(得分:2)
这是一个更简单的表达式(任何句号未跟随或之前的句点),外部捕获组围绕整个或|
子句,以避免None
,而不仅仅是第一部分:
re.split(r'((?<!\.)\.(?!\.)|!)', a)
# Result:
# ['This is my text...I want it to split by periods',
# '.',
# ' I also want it to split by exclamation marks',
# '!',
# ' Is that so much to ask?']
答案 2 :(得分:1)
它正在发生,因为在每个感叹号之后都有一个空格字符,在此处返回None
。
您可以使用过滤器删除这些None
。
>>> import re
>>> a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
>>> filter(lambda x:x!=None, re.split('((?<=\w)\.(?!\..))|(!)',a))
['This is my text...I want it to split by periods', '.', ' I also want it to split by exclamation marks', '!', ' Is that so much to ask?']