我有一个简单的代码,其中读取excel .xlsx文件作为数据框,然后使用 to_pickle 将其写为pickle文件。随着新的Excel文件的到来,我几个月来一直在使用相同的代码进行读写。但是,这一次,当我尝试代码时,它给出了 TypeError:由于某种原因无法序列化'_io.BufferedReader'对象错误。这是代码,
protoc */*.proto --objc_out=. --proto_path MyFolder
以下是我得到的输出和错误,
# Path to .xlsx
MasterItem = MonthlyFolder + "MasterItem__Nov2019.xlsx"
# Function to read the excel file
def ReadExcel(filename, sheetname=None, header=0):
from openpyxl import load_workbook
wb = load_workbook(filename, read_only=True)
if sheetname is None: # If sheetname is not provided then grab the first sheet
print("\t Reading " + wb.sheetnames[0])
ws = wb[wb.sheetnames[0]]
else:
print("\t Reading " + sheetname)
ws = wb[sheetname]
data = ws.values
if header is None:
columns = None
elif header > 0:
# Skip non header rows
for i in range(0, header):
next(data)
# Save header row
columns = next(data)[0:]
else:
columns = next(data)[0:]
# Create a DataFrame based on the subsequent lines of data
df_Out = pd.DataFrame(data, columns=columns)
return df_Out
# Reading .xlsx and writing as pickle
RawMasterItem = ReadExcel(MasterItem)
pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl") # This fails to run
答案 0 :(得分:0)
因此,经过一整天的调试,事实证明,对于我的excel文件中的某些空白单元格,openpyxl返回了<ReadOnlyCell 'Sheet1'.D2>
对象。当我尝试将数据框写为pickle时,此单元格进一步造成了问题。即使列的数据类型为“ str”,但是当我再次将数据类型显式更改为“ str”时,它也解决了该问题。
RawMasterItem['Column'] = RawMasterItem['Column'].astype('str')
显然,openpyxl不是正确读取并返回null / blank,而是返回一些奇怪的对象,这些对象随后无法序列化。