我有一个字典
{'from.x': 'from.changed.x',...}
可能非常大,我必须在相当大的目录结构中将文本文件替换为该字典。
我没有发现任何可能有任何好处的解决方案而且我最终:
用类似的东西:
def fix_imports(top_dir, not_ui_keys):
"""Walk through the directory and substitute the wrong imports
"""
repl = {}
for n in not_ui_keys:
# interleave a model in between
dotted = extract_dotted(n)
if dotted:
repl[dotted] = add_model(dotted)
for root, dirs, files in walk(top_dir):
py_files = [path.join(root, x) for x in files if x.endswith('.py')]
for py in py_files:
res = replace_text(open(py).read(), repl)
def replace_text(orig_text, replace_map):
res = orig_text
# now try to grep all the keys, using a translate maybe
# with a dictionary of the replacements
for to_replace in replace_map:
res.replace(to_replace, replace_map[to_replace])
# now print the differences
for un in unified_diff(res.splitlines(), orig_text.splitlines()):
print(un)
return res
有没有更好/更好/更快的方法呢?
编辑: 澄清一点问题,替换是从函数生成的,它们都是以下形式:
{'x.y.z': 'x.y.added.z', 'x.b.a': 'x.b.added.a'}
是的,确定我应该更好地使用正则表达式,我只是觉得这次我不需要它们。 但是,我不认为它会有多大帮助,因为我无法用一个(或多个)正则表达式来形式化整个替换范围..
答案 0 :(得分:2)
我会使用生成器编写第一个函数:
def fix_imports(top_dir, not_ui_keys):
"""Walk through the directory and substitute the wrong imports """
from itertools import imap,ifilter
gen = ifilter(None,imap(extract_dotted, not_ui_keys))
repl = dict((dotted,add_model(dotted)) for dotted in gen)
py_files = (path.join(root, x)
for root, dirs, files in walk(top_dir)
for x in files if x[-3:]=='.py')
for py in py_files:
with open(py) as opf:
res = replace_text(opf.read(), repl)
x[-3:]=='.py'
比x.endswith('.py')
答案 1 :(得分:0)
谢谢大家,关于从许多文件中的映射替换的问题,我认为我有一个可行的解决方案:
def replace_map_to_text(repl_map, text_lines):
"""Take a dictionary with the replacements needed and a list of
files and return a list with the substituted lines
"""
res = []
concat_st = "(%s)" % "|".join(repl_map.keys())
# '.' in non raw regexp means one of any characters, so must be
# quoted ore we need a way to make the string a raw string
concat_st = concat_st.replace('.', '\.')
combined_regexp = re.compile(concat_st)
for line in text_lines:
found = combined_regexp.search(line)
if found:
expr = found.group(1)
new_line = re.sub(expr, repl_map[expr], line)
logger.info("from line %s to line %s" % (line, new_line))
res.append(new_line)
else:
res.append(line)
return res
def test_replace_string():
lines = ["from psi.io.api import x",
"from psi.z import f"]
expected = ["from psi.io.model.api import x",
"from psi.model.z import f"]
mapping = {'psi.io.api': 'psi.io.model.api',
'psi.z': 'psi.model.z'}
assert replace_map_to_text(mapping, lines) == expected
简而言之,我在表格中构成了一个很大的正则表达式 (第一|第二|第三)
然后我在每一行搜索它,并用re.sub代替,如果找到了什么。
仍然有点粗糙,但经过简单的测试后效果很好。
编辑:在连接中修复了一个令人讨厌的错误,因为如果它不是原始字符串'。'仅表示一个字符,而不是'。'