用户在python中指定替换正则表达式

时间:2013-03-26 15:52:18

标签: python regex

处理处理文本文件和日志的简单脚本。它必须从命令行获取替换的正则表达式列表。例如:

./myscript.py --replace=s/foo/bar/ --replace=s@/etc/hosts@/etc/foo@ --replace=@test\@email.com@root\@email.com@

是否有一种简单的方法可以为python re库提供用户指定的替换模式?并且该模式是否与字符串相对应?任何优雅的解决方案?

如果可能的话,我想避免编写自己的解析器。请注意,我想支持/ g或/ i等修饰符。

谢谢!

3 个答案:

答案 0 :(得分:0)

如评论中所述,您可以使用re.compile(),但这仅适用于matchsearch。假设你只有替换,你可能会这样做:

modifiers_map = {
    'i': re.IGNORE,
    ...
}

for replace in replacements:
    # Look for a generalized separator in front of a command
    m = re.match(r'(s?)(.)([^\2]+)\2([^\2]+)\2([ig]*)', replace)
    if not m:
        print 'Invalid command: %s' % replace
        continue
    command, separator, query, substitution, modifiers = m.groups()
    # Convert the modifiers to flags
    flags = reduce(operator.__or__, [modifiers_map[char] for char in modifiers], 0)
    # This needs a little bit of tweaking if you want to support
    # group matching (like \1, \2, etc.). This also assumes that
    # you're only getting 's' as a command
    my_text = re.sub(query, substitution, my_text, flags=flags)

我只想说,这是一个粗略的草案,但我认为它可以让你获得90%的目标。

答案 1 :(得分:0)

您可以使用空格作为分隔符来利用shell的命令行解析器:

$ myscript --replace=foo bar \
>          --replace=/etc/hosts /etc/foo gi \
>          --replace=test@email.com root@email.com 

g标志是Python中的默认标志,因此您需要为其添加特殊支持:

#!/usr/bin/env python
import re
from argparse import ArgumentParser
from functools import partial

all_re_flags = 'Lgimsux' # regex flags
parser = ArgumentParser(usage='%(prog)s [--replace PATTERN REPL [FLAGS]]...')
parser.add_argument('-e', '--replace', action='append', nargs='*')
args = parser.parse_args()
print(args.replace)

subs = [] # replacement functions: input string -> result
for arg in args.replace:
    count = 1 # replace only the first occurrence if no `g` flag
    if len(arg) == 2:
        pattern, repl = arg
    elif len(arg) == 3:
        pattern, repl, flags = arg
        if ''.join(sorted(flags)) not in all_re_flags:
            parser.error('invalid flags %r for --replace option' % flags)
        if 'g' in flags: # add support for `g` flag
            flags = flags.replace('g', '')
            count = 0 # replace all occurrences
        if flags: # embed flags
            pattern = "(?%s)%s" % (flags, pattern)
    else:
        parser.error('wrong number of arguments for --replace option')
    subs.append(partial(re.compile(pattern).sub, repl, count=count))

您可以使用subs,如下所示:

input_string = 'a b a b'
for replace in subs:
    print(replace(input_string))

示例:

$ ./myscript -e 'a b' 'no flag' -e 'a B' 'with flags' ig

输出:

[['a b', 'no flag'], ['a B', 'with flags', 'ig']]
no flag a b
with flags with flags

答案 2 :(得分:0)

感谢您的回答。鉴于任何提议的解决方案的复杂性以及标准库中缺少预先支持的解析器,我只是加倍努力并实现了自己的解析器。

它没有其他提案复杂得多,见下文。我现在只需要编写测试。

谢谢!

class Replacer(object):
  def __init__(self, patterns=[]):
    self.patterns = []
    for pattern in patterns:
      self.AddPattern(pattern)

  def ParseFlags(self, flags):
    mapping = {
      'g': 0, 'i': re.I, 'l': re.L, 'm': re.M, 's': re.S, 'u': re.U, 'x': re.X,
      'd': re.DEBUG
    }

    result = 0
    for flag in flags:
      try:
        result |= mapping[flag]
      except KeyError:
        raise ValueError(
            "Invalid flag: %s, known flags: %s" % (flag, mapping.keys()))
    return result

  def Apply(self, text):
    for regex, repl in self.patterns:
      text = regex.sub(repl, text)
    return text

  def AddPattern(self, pattern):
    separator = pattern[0]
    match = []
    for position, char in enumerate(pattern[1:], start=1):
      if char == separator:
        if pattern[position - 1] != '\\':
          break
        match[-1] = separator
        continue
      match += char
    else:
      raise ValueError("Invalid pattern: could not find divisor.")

    replacement = []
    for position, char in enumerate(pattern[position + 1:], start=position + 1):
      if char == separator:
        if pattern[position - 1] != '\\':
          break
        replacement[-1] = separator
        continue
      replacement += char
    else:
      raise ValueError(
          "Invalid pattern: could not find divisor '%s'." % separator)

    flags = self.ParseFlags(pattern[position + 1:])
    match = ''.join(match)
    replacement = ''.join(replacement)
    self.patterns.append((re.compile(match, flags=flags), replacement))