在大文件中搜索文本并将结果写入文件

时间:2019-05-17 19:48:32

标签: python search text

我有一个文件是240万行(256mb),有两个文件是3.2万行(1.5mb)。

我需要逐行浏览文件二并在文件一中打印匹配的行。

伪代码:

open file 1, read
open file 2, read
open results, write

for line2 in file 2:
    for line1 in file 1:
        if line2 in line1:
            write line1 to results
            stop inner loop

我的代码:

p = open("file1.txt", "r")
d = open("file2.txt", "r")
o = open("results.txt", "w")

for hash1 in p:
    hash1 = hash1.strip('\n')
    for data in d:
        hash2 = data.split(',')[1].strip('\n')
        if hash1 in hash2:
            o.write(data)

o.close()
d.close()
p.close()

我希望获得32k的结果。

1 个答案:

答案 0 :(得分:0)

您的file2不太大,因此将其加载到内存中非常好。

  • 将file2.txt加载到集合中以加快搜索过程并删除重复项;
  • 从集合中删除空行;
  • 逐行扫描file1.txt并将找到的匹配项写入results.txt。

with open("file2.txt","r") as f:
    lines = set(f.readlines())

lines.discard("\n")

with open("results.txt", "w") as o:
    with open("file1.txt","r") as f:
        for line in f:
            if line in lines:
                o.write(line)

如果file2较大,我们可以将其拆分为多个块,然后对每个块重复相同的操作,但是那样的话,将结果汇总在一起会比较困难