将熊猫数据帧写入to_pickle时,如何解决“ TypeError:无法序列化'_io.BufferedReader'对象”?

时间:2019-12-08 23:56:56

标签: python python-3.x pandas pickle

我有一个简单的代码,其中读取excel .xlsx文件作为数据框,然后使用 to_pickle 将其写为pickle文件。随着新的Excel文件的到来,我几个月来一直在使用相同的代码进行读写。但是,这一次,当我尝试代码时,它给出了 TypeError:由于某种原因无法序列化'_io.BufferedReader'对象错误。这是代码,

protoc */*.proto --objc_out=. --proto_path MyFolder

以下是我得到的输出和错误,

# Path to .xlsx
MasterItem = MonthlyFolder + "MasterItem__Nov2019.xlsx"

# Function to read the excel file
def ReadExcel(filename, sheetname=None, header=0):
    from openpyxl import load_workbook

    wb = load_workbook(filename, read_only=True)

    if sheetname is None:  # If sheetname is not provided then grab the first sheet
        print("\t Reading " + wb.sheetnames[0])
        ws = wb[wb.sheetnames[0]]
    else:
        print("\t Reading " + sheetname)
        ws = wb[sheetname]

    data = ws.values

    if header is None:
        columns = None
    elif header > 0:
        # Skip non header rows
        for i in range(0, header):
            next(data)
        # Save header row
        columns = next(data)[0:]
    else:
        columns = next(data)[0:]

    # Create a DataFrame based on the subsequent lines of data
    df_Out = pd.DataFrame(data, columns=columns)

    return df_Out

# Reading .xlsx and writing as pickle
RawMasterItem = ReadExcel(MasterItem)
pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl") # This fails to run

1 个答案:

答案 0 :(得分:0)

因此,经过一整天的调试,事实证明,对于我的excel文件中的某些空白单元格,openpyxl返回了<ReadOnlyCell 'Sheet1'.D2>对象。当我尝试将数据框写为pickle时,此单元格进一步造成了问题。即使列的数据类型为“ str”,但是当我再次将数据类型显式更改为“ str”时,它也解决了该问题。

RawMasterItem['Column'] = RawMasterItem['Column'].astype('str')

显然,openpyxl不是正确读取并返回null / blank,而是返回一些奇怪的对象,这些对象随后无法序列化。