我正在尝试编写一个脚本,该脚本根据第一列匹配连接2个压缩文件。我想在块中执行此操作,因为我使用的原始代码是CSV文件,当与这些文件一起使用时会产生内存错误。
我使用的代码有内存错误(但适用于较小的文件):
f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('output.csv', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
file2 = list(c2)
for file1_row in c1:
row = 1
found = False
results_row = file1_row #Moved out from nested loop
for file2_row in file2:
x = file2_row[1:]
if file1_row[0] == file2_row[0]:
results_row.append(x)
found = True
break
row += 1
if not found:
results_row.append('Not found')
c3.writerow(results_row)
f1.close()
f2.close()
f3.close()
我尝试将这项工作放在我使用块的地方,但认为它的格式错误。
f1 = open('final1.gz', 'r')
f2 = open('final2.gz', 'r')
f3 = open('results.gz.DONE', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
file2 = list(c2)
fileList = ['final_balance.gz', 'final_service.gz']
for fileName in fileList:
with open(fileName, 'rb') as sourceFile:
chunk = True
while chunk:
chunk = sourceFile.read(bufferSize)
#file2 = list(c2) # MemoryError occurs on this line.
for file1_row in c1:
row = 1
found = False
results_row = file1_row #Moved out from nested loop
for file2_row in file2:
x = file2_row[1:]
if file1_row[0] == file2_row[0]:
results_row.append(x)
found = True
break
row += 1
if not found:
results_row.append('Not found')
c3.writerow(results_row)
此时我收到错误:
File "function.py", line 20, file2 = list(c2) MemoryError.
我无法使用熊猫,因为我无法访问。