Question

我有一个python字符串，我需要删除括号。标准方法是使用text = re.sub(r'\([^)]*\)', '', text)，因此括号内的内容将被删除。

但是，我刚发现一个看起来像(Data with in (Boo) And good luck)的字符串。使用我使用的正则表达式，它仍然会留下And good luck)部分。我知道我可以扫描整个字符串，并尝试保留(和)的数量的计数器，当数字平衡时，索引(和{{1}的位置并删除中间的内容，但是有更好/更清洁的方法吗？它不需要是正则表达式，无论它工作的是什么都很棒，谢谢。

有人问到了预期的结果，所以这就是我所期待的：

)

发布替换我希望它是Hi this is a test ( a b ( c d) e) sentence，而不是Hi this is a test sentence

Answer 1

使用re模块（替换最里面的括号，直到不再需要替换）：

import re

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

nb_rep = 1

while (nb_rep):
    (s, nb_rep) = re.subn(r'\([^()]*\)', '', s)

print(s)

使用允许递归的regex module：

import regex

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

print(regex.sub(r'\([^()]*+(?:(?R)[^()]*)*+\)', '', s))

(?R)指的是整个模式本身。

Answer 2

首先，我将该行拆分为不包含括号的标记，以便稍后将它们连接到一个新行：

line = "(Data with in (Boo) And good luck)"
new_line = "".join(re.split(r'(?:[()])',line))
print ( new_line )
# 'Data with in Boo And good luck'

Answer 3

没有正则表达式......

>>> a = 'Hi this is a test ( a b ( c d) e) sentence'
>>> o = ['(' == t or t == ')' for t in a]
>>> o
[False, False, False, False, False, False, False, False, False, False,
 False, False, False, False, False, False, False, False, True, False, False, 
 False, False, False, True, False, False, False, False, True, False, False,
 True, False, False, False, False, False, False, False, False, False]
>>> start,end=0,0
>>> for n,i in enumerate(o):
...  if i and not start:
...   start = n
...  if i and start:
...   end = n
...
>>>
>>> start
18
>>> end
32
>>> a1 = ' '.join(''.join(i for n,i in enumerate(a) if (n<start or n>end)).split())
>>> a1
'Hi this is a test sentence'
>>>

Answer 4

假设（1）总是有匹配的括号和（2）我们只删除括号和它们之间的所有内容（即括号周围的周围空间不变），以下内容应该有效。

它基本上是一个状态机，可以保持嵌套括号的当前深度。如果字符（1）不是括号，并且（2）当前深度为0，我们保留字符。

没有正则表达式。没有递归。单个传递输入字符串，没有任何中间列表。

tests = [
    "Hi this is a test ( a b ( c d) e) sentence",
    "(Data with in (Boo) And good luck)",
]

delta = {
    '(': 1,
    ')': -1,
}

def remove_paren_groups(input):
    depth = 0

    for c in input:
        d = delta.get(c, 0)
        depth += d
        if d != 0 or depth > 0:
            continue
        yield c

for input in tests:
    print ' IN: %s' % repr(input)
    print 'OUT: %s' % repr(''.join(remove_paren_groups(input)))

输出：

 IN: 'Hi this is a test ( a b ( c d) e) sentence'
OUT: 'Hi this is a test  sentence'
 IN: '(Data with in (Boo) And good luck)'
OUT: ''

Answer 5

引用自here

import re
item = "example (.com) w3resource github (.com) stackoverflow (.com)"

### Add lines in case there are non-ascii problem:
# -*- coding: utf-8 -*-
item = item .decode('ascii', errors = 'ignore').encode()

print re.sub(r" ?\([^)]+\)", "", item)

如何在多层括号python中删除文本

5 个答案: