Question

我有一个包含一些字符串的列表。
我有一组可能包含或不包含这些字符串的文件。
我需要在每个文件实例中用字符串的修改版本替换这些字符串。（例如。string1_abc - ＆gt; string1_xyz，string2_abc - ＆gt; string2_xyz）。实质上，需要替换和/或修改的子字符串在列表中的所有项目中都很常见。

有没有优化或简单的方法呢？我能想到的最天真的算法会查看每个文件中的每一行，并且对于每一行，迭代列表中的每个项目并使用line.replace替换它。我知道这会给我一个O（mnq）复杂度m = number of files，n = number of lines per file和q = number of items in the list

注意：

所有文件大小都不是很大，所以我不确定是否读取行 by line vs file.read（）进入缓冲区会更好吗？
q也不是很大。该清单约有40-50项。
m很好大。
n可以达到5000行。

另外，我只是在一边玩Python，并不是很习惯。另外，我只限于使用Python 2.6

Answer 1

伪Python：

import glob
LoT=[("string1_abc","string1_xyz"), ("string2_abc","string2_xyz")]
for fn in glob.glob(glob_describes_your_files):
    with open(fn) as f_in:
       buf=f_in.read()    # You said n is about 5000 lines so 
                          #        I would just read it in
       for t in LoT:
           buf=buf.replace(*t)
    # write buf back out to a new file or the existing one 
    with open(fn, "w") as f_out:
        f_out.write(buf)

像这样......

如果文件很大，请在文件上使用mmap进行调查，其他所有内容都大致相同。

最优化的方法来替换列表

1 个答案: