我有一个输入字符串,其中包含在双引号内外的括号,这些括号可以嵌套。我想用仅在双引号之外的括号去除字符串。
我尝试过此正则表达式r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)'
这样,无论在双引号内还是在双引号外,都将提取所有包含在圆括号内的内容。
import re
input_string = '''"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))'''
result = re.sub(r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)','', input_string)
print result
我得到的实际输出是:
'"Hello World " anything outside round brackets should remain as is'
我希望输出为:
'"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is'
答案 0 :(得分:1)
如果您的括号是平衡的(在this的帮助下):
import re
input_string = '''"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this (String this)'''
def strip_parentheses(g):
n = 1 # run at least once
while n:
g, n = re.subn(r'\([^()]*\)', '', g) # remove non-nested/flat balanced parts
return g
s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), input_string)
print(s)
打印:
"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is Also remain this
EDIT运行一些测试用例:
import re
input_string = '''"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''
test_cases = ['Normal string (strip this)',
'"Normal string (dont strip this)"',
'"Normal string (dont strip this)" but (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]
def strip_parentheses(g):
n = 1 # run at least once
while n:
g, n = re.subn(r'\([^()]*\)', '', g) # remove non-nested/flat balanced parts
return g
def my_strip(s):
return re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)
for test in test_cases:
print(test)
print(my_strip(test))
print()
打印:
Normal string (strip this)
Normal string
"Normal string (dont strip this)"
"Normal string (dont strip this)"
"Normal string (dont strip this)" but (strip this)
"Normal string (dont strip this)" but
"Normal string (dont strip this)" but (strip this) and (strip this)
"Normal string (dont strip this)" but and
"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but and but "dont strip (this)"
"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but and
"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)")
"Normal string (dont strip this)" but ( but "remain this (xxx)")
编辑:要删除所有()
,即使其中带有引号的字符串也是如此:
import re
input_string = '''"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''
test_cases = ['"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]
def strip_parentheses(g):
n = 1 # run at least once
while n:
g, n = re.subn(r'\([^()]*\)', '', g) # remove non-nested/flat balanced parts
return g
def my_strip(s):
s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)
return re.sub(r'".*?"|(\(.*\))', lambda g: '' if g.group(1) else g.group(), s)
for test in test_cases:
print(test)
print(my_strip(test))
print()
打印:
"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but and but "dont strip (this)"
"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but and
"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)")
"Normal string (dont strip this)" but
答案 1 :(得分:0)
使用regex
代替re
,您可以选择
"[^"]+"(*SKIP)(*FAIL) # ignore anything between double quotes
| # or
\(
(?:[^()]*|(?R))+ # match nested parentheses
\)
Python
中,这可能是
import regex as re
data = """"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))"""
rx = re.compile(r'''
"[^"]+"(*SKIP)(*FAIL)
|
\(
(?:[^()]*|(?R))+
\)''', re.VERBOSE)
data = rx.sub("", data)
print(data)
屈服
"Hello World (Don't want to strip this (also not this))" anything outside round brackets should remain as is