如何使用正则表达式删除不在括号内的所有字符

时间:2017-06-08 02:13:59

标签: python regex

我有一个包含括号内部和外部逗号的字符串:     foo(bat,foo),bat

如何使用正则表达式替换括号内的逗号?                                                                   foo(bat,foo)bat

3 个答案:

答案 0 :(得分:2)

假设没有嵌套的括号并且没有无效的括号配对,你可以使用正则表达式来做这个,因为逗号只会在一对括号之外,当且仅当有跟随它的偶数个()符号。因此,您可以使用前瞻性正则表达式实现此目的。

,(?![^(]*\))

如果存在嵌套括号,则它将成为无上下文语法,并且您无法仅使用正则表达式捕获它。你最好只使用拆分方法。

示例:

import re
ori_str = "foo(bat,foo),bat  foo(bat,foo),bat";
rep_str = re.sub(r',(?![^(]*\))', '', ori_str)
print(rep_str)

答案 1 :(得分:2)

你真的想使用re,或者无论如何要实现你的目标还可以吗?

在后一种情况下,这是一种方法:

mystring = 'foo(bat,foo),bat'
''.join(si + ',' if '(' in si else si for si in mystring.split(','))

#'foo(bat,foo)bat'

答案 2 :(得分:1)

考虑到我们要删除所有块之外的所有逗号,并且不想修改嵌套块。

让我们为

找到未关闭/未打开的块时的情况添加字符串验证
def validate_string(string):
    left_parts_count = len(string.split('('))
    right_parts_count = len(string.split(')'))
    diff = left_parts_count - right_parts_count
    if diff == 0:
        return
    if diff < 0:
        raise ValueError('Invalid string: "{string}". '
                         'Number of closed '
                         'but not opened blocks: {diff}.'
                         .format(string=string,
                                 diff=-diff))
    raise ValueError('Invalid string: "{string}". '
                     'Number of opened '
                     'but not closed blocks: {diff}.'
                     .format(string=string,
                             diff=diff))

然后我们可以在没有正则表达式的情况下完成工作,只需使用str方法

def remove_commas_outside_of_parentheses(string):
    # if you don't need string validation
    # then remove this line and string validator
    validate_string(string)

    left_parts = string.split('(')
    if len(left_parts) == 1:
        # no opened blocks found,
        # remove all commas
        return string.replace(',', '')

    left_outer_part = left_parts[0]

    left_outer_part = left_outer_part.replace(',', '')

    left_unopened_parts = left_parts[-1].split(')')
    right_outer_part = left_unopened_parts[-1]
    right_outer_part = right_outer_part.replace(',', '')
    return '('.join([left_outer_part] +
                    left_parts[1:-1] +
                    [')'.join(left_unopened_parts[:-1]
                              + [right_outer_part])])

我认为它看起来有点讨厌,但它确实有效。

测试

>>>remove_commas_outside_of_parentheses('foo,bat')
foobat
>>>remove_commas_outside_of_parentheses('foo,(bat,foo),bat')
foo(bat,foo)bat
>>>remove_commas_outside_of_parentheses('bar,baz(foo,(bat,foo),bat),bar,baz')
barbaz(foo,(bat,foo),bat)barbaz

&#34;碎&#34;的:

>>>remove_commas_outside_of_parentheses('(')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<input>", line 4, in remove_commas_outside_of_parentheses
  File "<input>", line 17, in validate_string
ValueError: Invalid string: "(". Number of opened but not closed blocks: 1.
>>>remove_commas_outside_of_parentheses(')')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<input>", line 4, in remove_commas_outside_of_parentheses
  File "<input>", line 12, in validate_string
ValueError: Invalid string: ")". Number of closed but not opened blocks: 1.