正则表达式负向lookbehind python

时间:2017-11-27 17:20:24

标签: python regex split

我正在尝试用不在括号内的逗号分隔字符串(即字符串包含用逗号分隔的项目,但它也包含括号内的逗号,我不想分开)。像这样:

A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'

哪个应该导致:

['[1, "A"]', ' [2, "B"]', ' [3, "C"]', ' [4, "D"]', ' [5, "E"]', ' [6, "F"]', ' [7, "G"]', ' [8, "H"]', ' [9, "I"]', ' [10, "J"]', '[100, "JJ"]']

我尝试使用负面的lookbehind:

B=re.split(r'(?<![[][\d]),',A)

但是,当括号内的数字超过一位数时,例如[10,“J”]的情况下,这不起作用。任何帮助将不胜感激!

5 个答案:

答案 0 :(得分:1)

这看起来像&#34;分隔在任何前面有]&#34; 的逗号可以正常工作。为了更好的衡量,我添加了\s*来占用下一个项目之前的空格。

import re

A = '[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'

re.split(r"(?<=]),\s*", A)

给出

['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

答案 1 :(得分:1)

你可以试试这个:

A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'
import re
data = re.split('(?<=\]),\s', A)

输出:

['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

答案 2 :(得分:0)

如果不要求使用splitfindall也可以使用非常简单的表达式,

In [27]: re.findall(r'\[.+?\]', A)
Out[27]:
['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

答案 3 :(得分:0)

尝试使用此正则表达式按组1获取每个项目:

(\[\d+,\s*\"\w+\"\])

您可以在此链接中看到结果:

https://regex101.com/r/K5XV6F/1

答案 4 :(得分:0)

使用较新的regex module,您可以使用

\[[^][]*\](*SKIP)(*FAIL) # discard anything in square brackets
|                        # or
,\s*                     # match , and whitespaces, eventually

<小时/> 在Python中,这看起来像

import regex as re

A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'

rx = re.compile(r'\[[^][]*\](*SKIP)(*FAIL)|,\s*')

print(rx.split(A))
# ['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

请参阅a demo on regex101.com