尝试使用'块'来合并.gz文件 - 内存错误

时间:2018-02-15 01:12:44

标签: python csv merge chunks

我正在尝试编写一个脚本,该脚本根据第一列匹配连接2个压缩文件。我想在块中执行此操作,因为我使用的原始代码是CSV文件,当与这些文件一起使用时会产生内存错误。

我使用的代码有内存错误(但适用于较小的文件):

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('output.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    results_row = file1_row  #Moved out from nested loop
    for file2_row in file2:
        x = file2_row[1:]
        if file1_row[0] == file2_row[0]:
            results_row.append(x)
            found = True
            break
    row += 1
    if not found:
        results_row.append('Not found')
    c3.writerow(results_row)



f1.close()
f2.close()
f3.close()

我尝试将这项工作放在我使用块的地方,但认为它的格式错误。

f1 = open('final1.gz', 'r')
f2 = open('final2.gz', 'r')
f3 = open('results.gz.DONE', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

fileList = ['final_balance.gz', 'final_service.gz']
for fileName in fileList:
    with open(fileName, 'rb') as sourceFile:
        chunk = True
        while chunk:
            chunk = sourceFile.read(bufferSize)
            #file2 = list(c2)  # MemoryError occurs on this line.
        for file1_row in c1:
            row = 1
            found = False
            results_row = file1_row  #Moved out from nested loop
        for file2_row in file2:
            x = file2_row[1:]
            if file1_row[0] == file2_row[0]:
                results_row.append(x)
                found = True
                break
        row += 1
        if not found:
            results_row.append('Not found')
        c3.writerow(results_row)

此时我收到错误:

File "function.py", line 20, file2 = list(c2) MemoryError.

我无法使用熊猫,因为我无法访问。

0 个答案:

没有答案