Question

如果我有一个像'[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'这样的凌乱字符串，并且想要将其拆分为一个列表，以便任何括号内的每个部分都是一个像['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']这样的项目，我该怎么做？我想不出一种使.split()方法起作用的方法。

Answer 1

您可以使用正则表达式

import re

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

lst = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
print(lst)

输出

['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']

说明

要匹配的正则表达式模式

r'\[(.*?)\]|\((.*?)\)'

子模式1：匹配方括号中的项目，即[...]

\[(.*?)\]  # Use \[ and \] since  [, ] are special characters
           #  we have to escape so they will be literal
 (.*?)     # Is a Lazy match of all characters

子模式2：在括号中进行匹配，即（..）

\((.*?)\)   # Use \( and \) since  (, ) are special characters
            # we have to escape so they will be literal

由于我们正在寻找以下两种模式之一：

'|'         # which is or between the two subpatterns
            # to match Subpattern1 or Subpattern

表达式

re.findall(r'\[(.*?)\]|\((.*?)\)', s)

[('Carrots', ''), ('Broccoli', ''), ('', 'cucumber'), ('', 'tomato'), ('spinach', '')]

结果在第一个或第二个元组中。因此我们使用：

[x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]

要从第一个或第二个元组中提取数据并将其放入列表中。

Answer 2

没有任何错误处理（例如检查嵌套的或不平衡的括号）：

def parse(expr):
    opening = "(["
    closing = ")]"
    result = []
    current_item = ""
    for char in expr:
        if char in opening:
            current_item = ""
            continue
        if char in closing:
            result.append(current_item)
            continue
        current_item += char
    return result

print(parse("(a)(b) stuff (c) [d] more stuff - (xxx)."))

>>> ['a', 'b', 'c', 'd', 'xxx']

根据您的需求，这可能已经足够了……

Answer 3

假设未使用示例字符串中没有的方括号或运算符（例如'-'），请尝试

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

words = []
for elem in s.replace('-', ' ').split():
    if '[' in elem or '(' in elem:
        words.append(elem.strip('[]()'))

或与list comprehension

words = [elem.strip('[]()') for elem in s.replace('-', ' ').split() if '[' in elem or '(' in elem]

如何使用带括号的分割字符串作为分隔符

3 个答案: