我是python
的新手,尽管我在25年前写过C
。这是我在python
中的第一个程序。
我一直在尝试使用csv file
将非常大(50万行,80列)xlsx file
转换为openpyxl
。我已经设法编写了excel文件,但是当我保存它时,它因内存错误而崩溃。
我正在使用python 3.6 (32-bit)
有人有任何提示吗?任何评论提前表示赞赏。谢谢!
代码和错误如下所示:
#!python3
import os, sys, csv, openpyxl, datetime, lxml
os.chdir('xxxxxxxxxxxxxxx')
# field sizes are large in input csv so need to increase the size of the field size limit
csv.field_size_limit(sys.maxsize)
# reading in the temporary working file
print('Reading cleaned file...')
with open('input_data.csv') as input_data:
dataReader = csv.reader(input_data,delimiter=';')
inputData = list(dataReader)
now=datetime.datetime.now()
dateStamp=now.strftime("%y%m%d")
newDatadump=dateStamp + ' output_data.xlsx'
# Deletes any old temporary working file.
if os.path.exists (newDatadump):
os.remove(newDatadump)
#writes an excel file
wb=openpyxl.Workbook(write_only=True)
sheet=wb.create_sheet()
print('Writing '+newDatadump+'...')
#debugging
numberOfRows=int(len(inputData))
print('number of rows',numberOfRows)
#create output file
for line in inputData:
sheet.append(line)
print('Phew...')
wb.save(newDatadump)
print('through...')
输出:
RESTART: xxxxxxxxxxx
Reading cleaned file...
Writing 180810 output_data.xlsx...
number of rows 551628
Phew...
然后我得到memory error
,这是堆栈跟踪。
堆栈跟踪:
Traceback (most recent call last):
File "C:/Users/Simon/Network Drive/DATA/992 test python/cleaning a file example for internet.py", line 38, in <module>
print('through...')
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\workbook\workbook.py", line 365, in save
save_dump(self, filename)
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 313, in save_dump
writer.save(filename)
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 266, in save
self.write_data()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 83, in write_data
self._write_worksheets()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 203, in _write_worksheets
xml = ws._write()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\worksheet\write_only.py", line 261, in _write
out = src.read()
MemoryError