我正在使用Pandas read_excel
函数从电子表格中导入数据。在Python解释器下运行时,此方法工作正常,但是当我使用PyInstaller构建exe时,它将返回IndexError。
下面是一个简化的代码pandas_test.py
,它演示了该问题:
import pandas as pd
filepath = 'C:/Users/User/Documents/Development/Python/PHL/Test Data/Study
template mock-up.xlsx'
df = pd.read_excel(filepath, sheet_name='Data Entry', index_col=9)
print(df.head())
这在带有熊猫0.23.4和xlrd 1.1.0的Python 3.6下运行得很好。
当我使用PyInstaller构建pandas_test.py
时,它成功产生了一个pandas_test.exe
,但是出现了这个错误:
Traceback (most recent call last): File "pandas_test.py", line 4, in <module> File "site-packages\pandas\io\excel.py", line 212, in read_excel File "site-packages\pandas\io\excel.py", line 513, in _parse_excel File "site-packages\pandas\io\parsers.py", line 1912, in TextParser File "site-packages\pandas\io\parsers.py", line 764, in __init__ File "site-packages\pandas\io\parsers.py", line 995, in _make_engine File "site-packages\pandas\io\parsers.py", line 2021, in __init__ File "site-packages\pandas\io\parsers.py", line 2772, in _get_index_name File "site-packages\pandas\io\parsers.py", line 3084, in _clean_index_names IndexError: list index out of range [17264] Failed to execute script pandas_test
我已经阅读了PyInstaller的输出,但是显然没有任何关联:
887 INFO: PyInstaller: 3.3.1
887 INFO: Python: 3.6.2
889 INFO: Platform: Windows-10-10.0.17134-SP0
892 INFO: wrote C:\Users\User\Documents\Development\Python\PandaTest\pandas_test.spec
我能理解是否找不到某个模块,但是为什么仅在已部署的代码中出现IndexError?
答案 0 :(得分:1)
我遇到了与您类似的错误,但就我而言,直到我运行它以获取以下脚本的“致命错误”(Excelfile.parse等同于read_excel(ExcelFile,...)),我才发出警告。看看Pyinstaller是否可以使用它
import xlrd
import pandas as pd
from os.path import join, isfile
from os import environ
if isfile(join(environ['USERPROFILE'],'Downloads','Report_15_13__12_12_2018.xlsx')):
rd=pd.Excelfile(join(environ['USERPROFILE'],'Downloads','Report_15_13__12_12_2018.xlsx'))
df1=rd.parse()
df1.to_excel(join(environ['USERPROFILE'],'Downloads','test.xls'))
我找不到解决方法的答案,但是通过从熊猫(https://github.com/pandas-dev/pandas/blob/v0.24.1/pandas/io/excel.py#L658-L718)复制代码,我成功地解决了该问题。就我而言,我只有1张纸,但我相信可以轻松支持多张纸。
from xlrd import (xldate, XL_CELL_DATE,
XL_CELL_ERROR, XL_CELL_BOOLEAN,
XL_CELL_NUMBER,open_workbook)
from datetime import date, datetime, time, timedelta
from pandas import DataFrame
from numpy import array,nan
from os import environ
from os.path import join
book=open_workbook(join(environ['USERPROFILE'],'Downloads','excel_to_read.xls'))
epoch1904 = book.datemode
sheet=book.sheet_by_index(0)
def _parse_cell(cell_contents, cell_typ):
"""converts the contents of the cell into a pandas
appropriate object"""
if cell_typ == XL_CELL_DATE:
# Use the newer xlrd datetime handling.
try:
cell_contents = xldate.xldate_as_datetime(
cell_contents, epoch1904)
except OverflowError:
return cell_contents
# Excel doesn't distinguish between dates and time,
# so we treat dates on the epoch as times only.
# Also, Excel supports 1900 and 1904 epochs.
year = (cell_contents.timetuple())[0:3]
if ((not epoch1904 and year == (1899, 12, 31)) or
(epoch1904 and year == (1904, 1, 1))):
cell_contents = time(cell_contents.hour,
cell_contents.minute,
cell_contents.second,
cell_contents.microsecond)
elif cell_typ == XL_CELL_ERROR:
cell_contents = nan
elif cell_typ == XL_CELL_BOOLEAN:
cell_contents = bool(cell_contents)
elif cell_typ == XL_CELL_NUMBER:
# GH5394 - Excel 'numbers' are always floats
# it's a minimal perf hit and less surprising
val = int(cell_contents)
if val == cell_contents:
cell_contents = val
return cell_contents
data = []
for i in range(sheet.nrows):
row = [_parse_cell(value, typ)
for value, typ in zip(sheet.row_values(i),
sheet.row_types(i))]
data.append(row)
NoOfColumns=len(sheet.row_values(i))
NoOfRows=sheet.nrows-1
DataFrame(array(data[1:]).reshape(NoOfRows,NoOfColumns),columns=data[0]).to_excel(join(environ['USERPROFILE'],'Desktop','test.xlsx'),index=False, engine='xlsxwriter')