RE拆分多个参数| (或)返回none python

时间:2012-07-03 22:41:53

标签: python

我在python中使用RE表达式并试图按句点和感叹号分割一段文本。但是当我拆分它时,我在结果中得到“无”

a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"

这是我的代码:

re.split('((?<=\w)\.(?!\..))|(!)',a)

请注意,我有这个(?&lt; = \ w)。(?!..),因为我希望它避免使用省略号。不过,上面的代码吐出来了:

['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']

正如您所看到的,在句号或感叹号所在的位置,它在我的列表中添加了一个特殊的“无”。为什么这样,我该如何摆脱它?

3 个答案:

答案 0 :(得分:9)

尝试以下方法:

re.split(r'((?<=\w)\.(?!\..)|!)', a)

您获得None,因为您有两个捕获组,并且所有组都包含在re.split()结果中。

因此,只要您匹配.第二个捕获组None,只要您匹配!,第一个捕获组就是None

结果如下:

['This is my text...I want it to split by periods',
 '.',
 ' I also want it to split by exclamation marks',
 '!',
 ' Is that so much to ask?']

如果您不想在结果中加入'.''!',只需删除围绕整个表达式的括号:r'(?<=\w)\.(?!\..)|!'

答案 1 :(得分:2)

这是一个更简单的表达式(任何句号未跟随或之前的句点),外部捕获组围绕整个或|子句,以避免None,而不仅仅是第一部分:

re.split(r'((?<!\.)\.(?!\.)|!)', a)

# Result:
# ['This is my text...I want it to split by periods', 
#  '.', 
#  ' I also want it to split by exclamation marks', 
#  '!', 
#  ' Is that so much to ask?']

答案 2 :(得分:1)

它正在发生,因为在每个感叹号之后都有一个空格字符,在此处返回None

您可以使用过滤器删除这些None

>>> import re
>>> a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"

>>> filter(lambda x:x!=None, re.split('((?<=\w)\.(?!\..))|(!)',a))

['This is my text...I want it to split by periods', '.', ' I also want it to split by exclamation marks', '!', ' Is that so much to ask?']