Question

假设我得到以下类型的字符串：

"(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

我希望提取括号中最顶层的子字符串。即我想获取字符串："this is (haha) a string(()and it's sneaky)"和"lorem"。

有没有一个很好的pythonic方法来做到这一点？正则表达式不明显直到此任务，但也许有一种方法可以让xml解析器完成这项工作？对于我的应用程序，我可以假设括号格式正确，即不像（（）（（）。

Answer 1

这是堆栈的标准用例：您按字符方式读取字符串，每当遇到左括号时，将符号推送到堆栈;如果遇到右括号，则从堆栈中弹出符号。

由于您只有一种类型的括号，因此实际上并不需要堆栈;相反，只要记住有多少个开括号就足够了。

此外，为了提取文本，我们还记得当第一级上的括号打开时部件开始的位置，并在遇到匹配的右括号时收集结果字符串。

这可能是这样的：

string = "(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

stack = 0
startIndex = None
results = []

for i, c in enumerate(string):
    if c == '(':
        if stack == 0:
            startIndex = i + 1 # string to extract starts one index later

        # push to stack
        stack += 1
    elif c == ')':
        # pop stack
        stack -= 1

        if stack == 0:
            results.append(string[startIndex:i])

print(results)
# ["this is (haha) a string(()and it's sneaky)", 'lorem']

Answer 2

你确定正则表达式还不够好吗？

>>> x=re.compile(r'\((?:(?:\(.*?\))|(?:[^\(\)]*?))\)')
>>> x.findall("(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla")
["(this is (haha) a string(()and it's sneaky)", '(lorem)']
>>> x.findall("((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla")
["((((this is (haha) a string((a(s)d)and ((it's sneaky))", '(lorem)']

Answer 3

这不是非常“pythonic”......但是

def find_strings_inside(what_open,what_close,s):
    stack = []
    msg = []
    for c in s:
        s1=""
        if c == what_open:
           stack.append(c)
           if len(stack) == 1:
               continue
        elif c == what_close and stack:
           stack.pop()
           if not stack:
              yield "".join(msg)
              msg[:] = []
        if stack:
            msg.append(c)

x= list(find_strings_inside("(",")","(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"))

print x

Answer 4

这或多或少地重复了已经说过的内容，但可能更容易阅读：

def extract(string):
    flag = 0
    result, accum = [], []
    for c in string:
        if c == ')':
            flag -= 1
        if flag:
            accum.append(c)
        if c == '(':
            flag += 1
        if not flag and accum:
            result.append(''.join(accum))
            accum = []
    return result

>> print extract(test)
["this is (haha) a string(()and it's sneaky)", 'lorem']

如何在平衡括号之间得到一个表达式

4 个答案: