处理处理文本文件和日志的简单脚本。它必须从命令行获取替换的正则表达式列表。例如:
./myscript.py --replace=s/foo/bar/ --replace=s@/etc/hosts@/etc/foo@ --replace=@test\@email.com@root\@email.com@
是否有一种简单的方法可以为python re库提供用户指定的替换模式?并且该模式是否与字符串相对应?任何优雅的解决方案?
如果可能的话,我想避免编写自己的解析器。请注意,我想支持/ g或/ i等修饰符。
谢谢!
答案 0 :(得分:0)
如评论中所述,您可以使用re.compile()
,但这仅适用于match
和search
。假设你只有替换,你可能会这样做:
modifiers_map = {
'i': re.IGNORE,
...
}
for replace in replacements:
# Look for a generalized separator in front of a command
m = re.match(r'(s?)(.)([^\2]+)\2([^\2]+)\2([ig]*)', replace)
if not m:
print 'Invalid command: %s' % replace
continue
command, separator, query, substitution, modifiers = m.groups()
# Convert the modifiers to flags
flags = reduce(operator.__or__, [modifiers_map[char] for char in modifiers], 0)
# This needs a little bit of tweaking if you want to support
# group matching (like \1, \2, etc.). This also assumes that
# you're only getting 's' as a command
my_text = re.sub(query, substitution, my_text, flags=flags)
我只想说,这是一个粗略的草案,但我认为它可以让你获得90%的目标。
答案 1 :(得分:0)
您可以使用空格作为分隔符来利用shell的命令行解析器:
$ myscript --replace=foo bar \
> --replace=/etc/hosts /etc/foo gi \
> --replace=test@email.com root@email.com
g
标志是Python中的默认标志,因此您需要为其添加特殊支持:
#!/usr/bin/env python
import re
from argparse import ArgumentParser
from functools import partial
all_re_flags = 'Lgimsux' # regex flags
parser = ArgumentParser(usage='%(prog)s [--replace PATTERN REPL [FLAGS]]...')
parser.add_argument('-e', '--replace', action='append', nargs='*')
args = parser.parse_args()
print(args.replace)
subs = [] # replacement functions: input string -> result
for arg in args.replace:
count = 1 # replace only the first occurrence if no `g` flag
if len(arg) == 2:
pattern, repl = arg
elif len(arg) == 3:
pattern, repl, flags = arg
if ''.join(sorted(flags)) not in all_re_flags:
parser.error('invalid flags %r for --replace option' % flags)
if 'g' in flags: # add support for `g` flag
flags = flags.replace('g', '')
count = 0 # replace all occurrences
if flags: # embed flags
pattern = "(?%s)%s" % (flags, pattern)
else:
parser.error('wrong number of arguments for --replace option')
subs.append(partial(re.compile(pattern).sub, repl, count=count))
您可以使用subs
,如下所示:
input_string = 'a b a b'
for replace in subs:
print(replace(input_string))
示例:
$ ./myscript -e 'a b' 'no flag' -e 'a B' 'with flags' ig
输出:
[['a b', 'no flag'], ['a B', 'with flags', 'ig']]
no flag a b
with flags with flags
答案 2 :(得分:0)
感谢您的回答。鉴于任何提议的解决方案的复杂性以及标准库中缺少预先支持的解析器,我只是加倍努力并实现了自己的解析器。
它没有其他提案复杂得多,见下文。我现在只需要编写测试。
谢谢!
class Replacer(object):
def __init__(self, patterns=[]):
self.patterns = []
for pattern in patterns:
self.AddPattern(pattern)
def ParseFlags(self, flags):
mapping = {
'g': 0, 'i': re.I, 'l': re.L, 'm': re.M, 's': re.S, 'u': re.U, 'x': re.X,
'd': re.DEBUG
}
result = 0
for flag in flags:
try:
result |= mapping[flag]
except KeyError:
raise ValueError(
"Invalid flag: %s, known flags: %s" % (flag, mapping.keys()))
return result
def Apply(self, text):
for regex, repl in self.patterns:
text = regex.sub(repl, text)
return text
def AddPattern(self, pattern):
separator = pattern[0]
match = []
for position, char in enumerate(pattern[1:], start=1):
if char == separator:
if pattern[position - 1] != '\\':
break
match[-1] = separator
continue
match += char
else:
raise ValueError("Invalid pattern: could not find divisor.")
replacement = []
for position, char in enumerate(pattern[position + 1:], start=position + 1):
if char == separator:
if pattern[position - 1] != '\\':
break
replacement[-1] = separator
continue
replacement += char
else:
raise ValueError(
"Invalid pattern: could not find divisor '%s'." % separator)
flags = self.ParseFlags(pattern[position + 1:])
match = ''.join(match)
replacement = ''.join(replacement)
self.patterns.append((re.compile(match, flags=flags), replacement))