Python正则表达式基于跟随数字的逗号进行拆分

时间:2016-01-08 01:46:01

标签: python regex string

我有一个大文件,我需要将其加载到字符串列表中。每个元素都将包含文本,直到紧跟在数字

之后的','

例如:

this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,

这应该成为:

["this is some text, value 45789", "followed by, 1245", "and more text 78965", "more random text 5252"]

我目前正在做re.sub(r'([0-9]+),','~', <input-string>)然后拆分'〜'(因为我的文件不包含〜)但这会丢掉逗号之前的数字..有什么想法吗?

2 个答案:

答案 0 :(得分:2)

您可以re.split使用positive look-behind assertion

>>> import re
>>> 
>>> text = 'this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,'
>>> re.split(r'(?<=\d),', text)
['this is some text, value 45789',
 ' followed by, 1245',
 ' and more text 78965',
 ' more random text 5252',
 '']

答案 1 :(得分:0)

如果您希望它也处理空格,请执行以下操作:

string = "  blah, lots  ,  of ,  spaces, here "
pattern = re.compile("^\s+|\s*,\s*|\s+$")
result = [x for x in pattern.split(string) if x]
print(result)
>>> ['blah', 'lots', 'of', 'spaces', 'here']