Question

有关于pandas DataFrame和pd.read_pickle（）函数的快速提问。基本上，我有一个大而简单的Dataframe（333 mb）。当我在数据帧上运行pd.read_pickle时，我得到了EOFError。

有没有解决这个问题的方法？你知道这可能导致什么吗？

谢谢！

Answer 1

使用以下方法创建泡菜时，我看到了相同的EOFError：

pandas.DataFrame.to_pickle('path.pkl', compression='bz2')

，然后尝试阅读：

pandas.read_pickle('path.pkl')

我通过提供读取时的压缩来解决此问题：

pandas.read_pickle('path.pkl', compression='bz2')

根据熊猫文档：

compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

    string representing the compression to use in the output file. By default, 
    infers from the file extension in specified path.

因此，只需将路径从“ path.pkl”更改为“ path.bz2”也可以解决此问题。

Answer 2

我可以确认greg_data的宝贵意见：

<块引用>

当我遇到这个错误时，我发现这是由于初始酸洗未正确完成。泡菜文件是已创建，但未正确完成。在我看来这是唯一的泡菜中 EOFError 的可能来源，泡菜是格式错误，即未完成。

我在酸洗过程中的错误是：

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-40-263240bbee7e> in <module>()
----> 1 main()

<ipython-input-38-9b3c6d782a2a> in main()
     43     with open("/content/drive/MyDrive/{}.file".format(tm.id), "wb") as f:
---> 44         pickle.dump(tm, f, pickle.HIGHEST_PROTOCOL)
     45 
     46     print('Coherence:', get_coherence(tm, token_lists, 'c_v'))

TypeError: can't pickle weakref objects

而在pickle过程中读取那个明显没有完成的pickle文件时，出现了报错：

pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')

错误：

---------------------------------------------------------------------------

EOFError                                  Traceback (most recent call last)

<ipython-input-41-460bdd0a2779> in <module>()
----> 1 object = pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')

/usr/local/lib/python3.7/dist-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
    180                 # We want to silence any warnings about, e.g. moved modules.
    181                 warnings.simplefilter("ignore", Warning)
--> 182                 return pickle.load(f)
    183         except excs_to_catch:
    184             # e.g.

EOFError: Ran out of input

EOF错误pd.read_pickle

2 个答案: