Question

我开始用正则表达式冒险。我有兴趣拆分特殊格式的字符串。如果一个字母不在括号内，它应该成为输出列表的不同元素。括号内的字母应放在一起。

样品：

my string =＆gt;通缉名单

"ab(hpl)x" =＆gt; ['a', 'b', 'hpl', 'x']
"(pck)(kx)(sd)" =＆gt; ['pck', 'kx', 'sd']
"(kx)kxx(kd)" =＆gt; ['kx', 'k', 'x', 'x', 'kd']
"fghk" =＆gt; ['f', 'g', 'h', 'k']

如何使用正则表达式和re.split来实现？在此先感谢您的帮助。

Answer 1

使用re.split无法做到这一点，因为它需要拆分零长度匹配。

来自http://docs.python.org/library/re.html#re.split：

请注意，split不会在空模式匹配上拆分字符串。

这是另一种选择：

re.findall(r'(\w+(?=\))|\w)', your_string)

一个例子：

>>> for s in ("ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"):
...     print s, " => ", re.findall(r'(\w+(?=\))|\w)', s)
... 
ab(hpl)x  =>  ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd)  =>  ['pck', 'kx', 'sd']
(kx)kxx(kd)  =>  ['kx', 'k', 'x', 'x', 'kd']
fghk  =>  ['f', 'g', 'h', 'k']

Answer 2

您希望findall不是split。使用此re：r'(?<=\()[a-z]+(?=\))|[a-z]'，适用于所有测试用例。

>>> test_cases = ["ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"]
>>> pat = re.compile(r'(?<=\()[a-z]+(?=\))|[a-z]')
>>> for test_case in test_cases:
...     print "%-13s  =>  %s" % (test_case, pat.findall(test_case))
...
ab(hpl)x       =>  ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd)  =>  ['pck', 'kx', 'sd']
(kx)kxx(kd)    =>  ['kx', 'k', 'x', 'x', 'kd']
fghk           =>  ['f', 'g', 'h', 'k']

修改

如果要匹配大写和小写字母，数字和下划线，请将[a-z]替换为\w。如果括号永远不会失衡（(?<=\()），则可以删除lookbehind断言"abc(def"。

如何用re.split分割我的字符串？

2 个答案: