如何使用带括号的分割字符串作为分隔符

时间:2020-04-05 11:11:23

标签: python list split

如果我有一个像'[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'这样的凌乱字符串,并且想要将其拆分为一个列表,以便任何括号内的每个部分都是一个像['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']这样的项目,我该怎么做?我想不出一种使.split()方法起作用的方法。

3 个答案:

答案 0 :(得分:1)

您可以使用正则表达式

import re

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

lst = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
print(lst)

输出

['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']

说明

要匹配的正则表达式模式

r'\[(.*?)\]|\((.*?)\)'

子模式1:匹配方括号中的项目,即[...]

\[(.*?)\]  # Use \[ and \] since  [, ] are special characters
           #  we have to escape so they will be literal
 (.*?)     # Is a Lazy match of all characters 

子模式2:在括号中进行匹配,即(..)

\((.*?)\)   # Use \( and \) since  (, ) are special characters
            # we have to escape so they will be literal

由于我们正在寻找以下两种模式之一:

'|'         # which is or between the two subpatterns
            # to match Subpattern1 or Subpattern

表达式

re.findall(r'\[(.*?)\]|\((.*?)\)', s)

[('Carrots', ''), ('Broccoli', ''), ('', 'cucumber'), ('', 'tomato'), ('spinach', '')]

结果在第一个或第二个元组中。因此我们使用:

[x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]

要从第一个或第二个元组中提取数据并将其放入列表中。

答案 1 :(得分:0)

没有任何错误处理(例如检查嵌套的或不平衡的括号):

def parse(expr):
    opening = "(["
    closing = ")]"
    result = []
    current_item = ""
    for char in expr:
        if char in opening:
            current_item = ""
            continue
        if char in closing:
            result.append(current_item)
            continue
        current_item += char
    return result

print(parse("(a)(b) stuff (c) [d] more stuff - (xxx)."))

>>> ['a', 'b', 'c', 'd', 'xxx']

根据您的需求,这可能已经足够了……

答案 2 :(得分:0)

假设未使用示例字符串中没有的方括号或运算符(例如'-'),请尝试

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

words = []
for elem in s.replace('-', ' ').split():
    if '[' in elem or '(' in elem:
        words.append(elem.strip('[]()'))

或与list comprehension

words = [elem.strip('[]()') for elem in s.replace('-', ' ').split() if '[' in elem or '(' in elem]