Question

我有一个简单的代码，其中读取excel .xlsx文件作为数据框，然后使用 to_pickle 将其写为pickle文件。随着新的Excel文件的到来，我几个月来一直在使用相同的代码进行读写。但是，这一次，当我尝试代码时，它给出了 TypeError：由于某种原因无法序列化'_io.BufferedReader'对象错误。这是代码，

# Path to .xlsx
MasterItem = MonthlyFolder + "MasterItem__Nov2019.xlsx"

# Function to read the excel file
def ReadExcel(filename, sheetname=None, header=0):
    from openpyxl import load_workbook

    wb = load_workbook(filename, read_only=True)

    if sheetname is None:  # If sheetname is not provided then grab the first sheet
        print("\t Reading " + wb.sheetnames[0])
        ws = wb[wb.sheetnames[0]]
    else:
        print("\t Reading " + sheetname)
        ws = wb[sheetname]

    data = ws.values

    if header is None:
        columns = None
    elif header > 0:
        # Skip non header rows
        for i in range(0, header):
            next(data)
        # Save header row
        columns = next(data)[0:]
    else:
        columns = next(data)[0:]

    # Create a DataFrame based on the subsequent lines of data
    df_Out = pd.DataFrame(data, columns=columns)

    return df_Out

# Reading .xlsx and writing as pickle
RawMasterItem = ReadExcel(MasterItem)
pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl") # This fails to run

以下是我得到的输出和错误，

    ../Data/2019Nov/MasterItem__Nov2019.xlsx
         Reading Sheet1
    Traceback (most recent call last):
      File "C:\Users\Eulhaq\AppData\Local\conda\conda\envs\DataScience\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-10-07041bb51f98>", line 3, in <module>
        pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl")
      File "C:\Users\Eulhaq\AppData\Local\conda\conda\envs\DataScience\lib\site-packages\pandas\io\pickle.py", line 76, in to_pickle
        f.write(pickle.dumps(obj, protocol=protocol))
    TypeError: cannot serialize '_io.BufferedReader' object

调试之后，我意识到openpyxl正在读取并以 <ReadOnlyCell 'Sheet1'.D2> 返回一些空白单元格。不知道为什么会这样。我已经检查了excel文件，这些位置没有隐藏的字符。知道为什么openpyxl无法像读取空白一样读取某些单元格吗？

为什么openpyxl读取一些空白的Excel单元格为<ReadOnlyCell'Sheet1'.D2>？

0 个答案: