Python正则表达式可在双引号之外查找嵌套括号

时间:2019-07-17 05:34:45

标签: python regex

我有一个输入字符串,其中包含在双引号内外的括号,这些括号可以嵌套。我想用仅在双引号之外的括号去除字符串。

我尝试过此正则表达式r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)' 这样,无论在双引号内还是在双引号外,都将提取所有包含在圆括号内的内容。

    import re
    input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))'''
    result = re.sub(r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)','', input_string)
    print result

我得到的实际输出是:

'"Hello World "  anything outside round brackets should remain as is'

我希望输出为:

'"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is'

2 个答案:

答案 0 :(得分:1)

如果您的括号是平衡的(在this的帮助下):

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this (String this)'''

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), input_string)

print(s)

打印:

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is Also remain this 

EDIT运行一些测试用例:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['Normal string (strip this)',
'"Normal string (dont strip this)"',
'"Normal string (dont strip this)" but (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    return re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

打印:

Normal string (strip this)
Normal string 

"Normal string (dont strip this)"
"Normal string (dont strip this)"

"Normal string (dont strip this)" but (strip this)
"Normal string (dont strip this)" but 

"Normal string (dont strip this)" but (strip this) and (strip this)
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but ( but "remain this (xxx)") 

编辑:要删除所有(),即使其中带有引号的字符串也是如此:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)
    return re.sub(r'".*?"|(\(.*\))', lambda g: '' if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

打印:

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but  

答案 1 :(得分:0)

使用regex代替re,您可以选择

"[^"]+"(*SKIP)(*FAIL) # ignore anything between double quotes
|                     # or
\(
    (?:[^()]*|(?R))+  # match nested parentheses
\)

请参见a demo on regex101.com


Python中,这可能是

import regex as re

data = """"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))"""

rx = re.compile(r'''
    "[^"]+"(*SKIP)(*FAIL)
    |
    \(
        (?:[^()]*|(?R))+
    \)''', re.VERBOSE)

data = rx.sub("", data)
print(data)

屈服

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is