MemoryError使用openpyxl保存非常大的工作簿

时间:2018-08-10 11:50:06

标签: excel python-3.x csv

我是python的新手,尽管我在25年前写过C。这是我在python中的第一个程序。

我一直在尝试使用csv file将非常大(50万行,80列)xlsx file转换为openpyxl。我已经设法编写了excel文件,但是当我保存它时,它因内存错误而崩溃。

我正在使用python 3.6 (32-bit)

有人有任何提示吗?任何评论提前表示赞赏。谢谢!

代码和错误如下所示:

#!python3

import os, sys, csv, openpyxl, datetime, lxml
os.chdir('xxxxxxxxxxxxxxx')

# field sizes are large in input csv so need to increase the size of the field size limit
csv.field_size_limit(sys.maxsize)

# reading in the temporary working file
print('Reading cleaned file...')
with open('input_data.csv') as input_data:
    dataReader = csv.reader(input_data,delimiter=';')
    inputData = list(dataReader)

now=datetime.datetime.now()
dateStamp=now.strftime("%y%m%d")

newDatadump=dateStamp + ' output_data.xlsx'
# Deletes any old temporary working file.  
if os.path.exists (newDatadump):
    os.remove(newDatadump)

#writes an excel file
wb=openpyxl.Workbook(write_only=True)
sheet=wb.create_sheet()
print('Writing '+newDatadump+'...')

#debugging
numberOfRows=int(len(inputData))
print('number of rows',numberOfRows)

#create output file
for line in inputData:
    sheet.append(line)

print('Phew...')
wb.save(newDatadump)
print('through...')

输出:

RESTART: xxxxxxxxxxx 
Reading cleaned file...
Writing 180810 output_data.xlsx...
number of rows 551628
Phew...

然后我得到memory error,这是堆栈跟踪。 堆栈跟踪:

Traceback (most recent call last):
  File "C:/Users/Simon/Network Drive/DATA/992 test python/cleaning a file example for internet.py", line 38, in <module>
    print('through...')
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\workbook\workbook.py", line 365, in save
    save_dump(self, filename)
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 313, in save_dump
    writer.save(filename)
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 266, in save
    self.write_data()
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 83, in write_data
    self._write_worksheets()
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 203, in _write_worksheets
    xml = ws._write()
  File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\worksheet\write_only.py", line 261, in _write
    out = src.read()
MemoryError

1 个答案:

答案 0 :(得分:0)

按照lxml的建议尝试安装 docs

click_checkbox()

这应该可以解决问题,就像我的情况一样。