Question

假设我有一个这样的字符串，其中的项目用逗号分隔，但在带有括号内容的项目中也可能有逗号：

（编辑：对不起，忘了提一些项目可能没有带括号的内容）

"Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"

如何将字符串除以不在括号内的逗号？即：

["Water", "Titanium Dioxide (CI 77897)", "Black 2 (CI 77266)", "Iron Oxides (CI 77491, 77492, 77499)", "Ultramarines (CI 77007)"]

我想我必须使用正则表达式，也许是这样的：

([(]?)(.*?)([)]?)(,|$)

但我仍在尝试让它发挥作用。

Answer 1

使用negative lookahead匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。

,\s*(?![^()]*\))

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

Answer 2

您可以使用str.replace和str.split来执行此操作。您可以使用任何字符替换),。

a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a

输出： -

['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']

Answer 3

试试正则表达式

[^()]*\([^()]*\),?

代码：

>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']

了解正则表达式的工作原理http://regex101.com/r/pS9oV3/1

Answer 4

使用regex，可以使用findall功能轻松完成此操作。

import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want

如果你想更好地理解正则表达式，请使用http://www.regexr.com/，这里是python文档的链接：https://docs.python.org/2/library/re.html

编辑：我修改了正则表达式字符串以接受没有括号的内容：\w[^,(]*(?:\(.*?\))?

Answer 5

我相信我有一个更简单的正则表达式：

rx_comma = re.compile(r",(?![^(]*\))")
result = rx_comma.split(string_to_split)

正则表达式的解释：

匹配,：
不是否，然后是：
- 以)结尾的字符列表，其中：
- ,和)之间的字符列表不包含(

在嵌套括号（例如a,b(c,d(e,f))）的情况下，它将不起作用。如果需要这样做，一种可能的解决方案是对结果进行拆分，并且如果字符串的圆括号内有一个圆括号而没有闭合，请执行合并:)，例如：

"a"
"b(c" <- no closing, merge this 
"d(e" <- no closing, merge this
"f))

Answer 6

这个版本似乎可以使用嵌套括号、方括号（[] 或 <>）和大括号：

def split_top(string, splitter, openers="([{<", closers = ")]}>", whitespace=" \n\t"):
    ''' Splits strings at occurance of 'splitter' but only if not enclosed by brackets.
        Removes all whitespace immediately after each splitter.
        This assumes brackets, braces, and parens are properly matched - may fail otherwise '''

outlist = []
outstring = []

depth = 0

for c in string:
    if c in openers:
        depth += 1
    elif c in closers:
        depth -= 1

        if depth < 0:
            raise SyntaxError()

    if not depth and c == splitter:
        outlist.append("".join(outstring))
        outstring = []
    else:
        if len(outstring):
            outstring.append(c)
        elif c not in whitespace:
            outstring.append(c)

outlist.append("".join(outstring))

return outlist

像这样使用它：

s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"

split = split_top(s, ",") # splits on commas

我知道，这可能不是最快的。

如何用不在括号内的逗号分隔？

6 个答案: