从替换地图递归替换

时间:2011-11-09 17:52:56

标签: python regex

我有一个字典

{'from.x': 'from.changed.x',...}

可能非常大,我必须在相当大的目录结构中将文本文件替换为该字典。

我没有发现任何可能有任何好处的解决方案而且我最终:

  • 使用os.walk
  • 遍历字典
  • 写出所有内容

用类似的东西:

def fix_imports(top_dir, not_ui_keys):
"""Walk through the directory and substitute the wrong imports
"""
    repl = {}
    for n in not_ui_keys:
        # interleave a model in between
        dotted = extract_dotted(n)
        if dotted:
            repl[dotted] = add_model(dotted)

    for root, dirs, files in walk(top_dir):
        py_files = [path.join(root, x) for x in files if x.endswith('.py')]

        for py in py_files:
            res = replace_text(open(py).read(), repl)


def replace_text(orig_text, replace_map):
    res = orig_text
    # now try to grep all the keys, using a translate maybe
    # with a dictionary of the replacements
    for to_replace in replace_map:
        res.replace(to_replace, replace_map[to_replace])

    # now print the differences
    for un in unified_diff(res.splitlines(), orig_text.splitlines()):
        print(un)

    return res

有没有更好/更好/更快的方法呢?

编辑: 澄清一点问题,替换是从函数生成的,它们都是以下形式:

{'x.y.z': 'x.y.added.z', 'x.b.a': 'x.b.added.a'}

是的,确定我应该更好地使用正则表达式,我只是觉得这次我不需要它们。 但是,我不认为它会有多大帮助,因为我无法用一个(或多个)正则表达式来形式化整个替换范围..

2 个答案:

答案 0 :(得分:2)

我会使用生成器编写第一个函数:

def fix_imports(top_dir, not_ui_keys):
    """Walk through the directory and substitute the wrong imports """
    from itertools import imap,ifilter
    gen = ifilter(None,imap(extract_dotted, not_ui_keys))
    repl = dict((dotted,add_model(dotted)) for dotted in gen)

    py_files = (path.join(root, x)
                for root, dirs, files in walk(top_dir)
                for x in files if x[-3:]=='.py')
    for py in py_files:
        with open(py) as opf:
            res = replace_text(opf.read(), repl)

x[-3:]=='.py'x.endswith('.py')

答案 1 :(得分:0)

谢谢大家,关于从许多文件中的映射替换的问题,我认为我有一个可行的解决方案:

 def replace_map_to_text(repl_map, text_lines):
     """Take a dictionary with the replacements needed and a list of
     files and return a list with the substituted lines
     """
     res = []
     concat_st = "(%s)" % "|".join(repl_map.keys())
     # '.' in non raw regexp means one of any characters, so must be
     # quoted ore we need a way to make the string a raw string
     concat_st = concat_st.replace('.', '\.')
     combined_regexp = re.compile(concat_st)

     for line in text_lines:
         found = combined_regexp.search(line)
         if found:
             expr = found.group(1)
             new_line = re.sub(expr, repl_map[expr], line)
             logger.info("from line %s to line %s" % (line, new_line))
             res.append(new_line)
         else:
             res.append(line)

     return res

 def test_replace_string():
     lines = ["from psi.io.api import x",
              "from psi.z import f"]

     expected = ["from psi.io.model.api import x",
                 "from psi.model.z import f"]


     mapping = {'psi.io.api': 'psi.io.model.api',
                'psi.z': 'psi.model.z'}

     assert replace_map_to_text(mapping, lines) == expected

简而言之,我在表格中构成了一个很大的正则表达式 (第一|第二|第三)

然后我在每一行搜索它,并用re.sub代替,如果找到了什么。

仍然有点粗糙,但经过简单的测试后效果很好。

编辑:在连接中修复了一个令人讨厌的错误,因为如果它不是原始字符串'。'仅表示一个字符,而不是'。'