python - 仅可变的正则表达式组替换

时间:2017-02-07 18:36:53

标签: python

我希望用#'#'替换正则表达式匹配组。字符。

将有可变数量的正则表达式,其中包含可变数量的组。

应替换正则表达式组的值。

#! /usr/bin/python

import re

data = """Line1 '4658'
Line2 data 'AAA'\tBBB\t55555
Roach""".splitlines()

# a variable number of Regex's containing a variable number of groups
needles = [ r"Line1 '(\d+)'",
        r"'(AAA)'\t\S+\t(\S+)",
        r"(Roach)" ]

pattern = re.compile('|'.join(needles))

for line in data:

 match = pattern.search(line)

 if (match):

  print(re.sub(match.string[match.start():match.end()], '#' * len(match.string), line))

# current output
"""
############
Line2 data ##########################
#####
"""

# desired output
"""
Line1 '####'
Line2 data '###' BBB #####
#####
"""

2 个答案:

答案 0 :(得分:0)

修改如下代码:

#! /usr/bin/python

import re

data = """Line1 '4658'
Line2 data 'AAA'\tBBB\t55555
Roach""".splitlines()

# a variable number of Regex 's containing a variable number of groups
needles = [r "Line1 '(\d+)'",
    r "'(AAA)'\t\S+\t(\S+)",
    r "(Roach)"
]

pattern = re.compile('|'.join(needles))

for line in data:
  match = pattern.search(line)
  for matched_str in match.groups():
    if (matched_str):
      line = re.sub(matched_str, '#' * len(matched_str), line)
  print(line)

跑步的时候:

$ python a.py
Line1 '####'
Line2 data '###'    BBB #####
#####

答案 1 :(得分:0)

您无需使用re.search()进行额外匹配。您只需要更改正则表达式,以便它们可以匹配字符串的所有部分,然后使用适当的函数来替换目标部分。

以下是其中一个句子的示例:

In [51]: def replacer(x):                                     
             matched = x.groups()
             if len(matched) == 4:
                 return "{}{}{}{}".format(matched[0], len(matched[1]) * '*', matched[2], len(matched[3]) * '*')
   ....:     

In [52]: pattern = re.compile(r"([^']*)'(AAA)'(\t\S+\t)(\S+)")

In [53]: pattern.sub(replacer, "Line2 data 'AAA'\tBBB\t55555")
Out[53]: 'Line2 data ***\tBBB\t*****'

以下是完整的代码:

import re

data = """Line1 '4658'
Line2 data 'AAA'\tBBB\t55555
Roach""".splitlines()

# a variable number of Regex's containing a variable number of groups
needles = [ r"(Line1 )'(\d+)'",
        r"([^']*)'(AAA)'(\t\S+\t)(\S+)",
        r"(Roach)" ]


def replacer(x):                                     
    matched = x.groups()
    if matched[2]:
        # in this case groups from 3rd index have been matched
        return "{}{}{}{}".format(matched[2], len(matched[3]) * '#', matched[4], len(matched[5]) * '#')
    elif matched[0]:
        # in this case groups from 1st index have been matched
        return "{}{}".format(matched[0], len(matched[1]) * '#')
    elif matched[-1]:
        # in this case last group has been matched
        return len(matched[-1]) * '#'


pattern = re.compile('|'.join(["{}".format(i) for i in needles]))


for line in data:
    print(pattern.sub(replacer, line))

输出:

Line1 ####
Line2 data ###  BBB #####
#####