我还没有弄清楚如何使用pandas DataFrames在python 2和3之间进行pickle加载/保存。有一个'协议'我在选手中选择的选项没有成功,但我希望有人能快速让我尝试。以下是获取错误的代码:
python2.7
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
ValueError: unsupported pickle protocol: 3
python3
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)
也许期待在python版本之间工作的pickle有点乐观?
答案 0 :(得分:6)
我遇到了同样的问题。您可以使用python3中的以下函数更改dataframe pickle文件的协议:
import pickle
def change_pickle_protocol(filepath,protocol=2):
with open(filepath,'rb') as f:
obj = pickle.load(f)
with open(filepath,'wb') as f:
pickle.dump(obj,f,protocol=protocol)
然后你应该可以在python2中打开它没问题。
答案 1 :(得分:1)
如果有人使用pandas.DataFrame.to_pickle()
,请在源代码中进行以下修改,以具备pickle协议设置的能力:
1)在源文件/pandas/io/pickle.py
中(修改前将原始文件复制为/pandas/io/pickle.py.ori
)搜索以下行:
def to_pickle(obj, path):
pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
将这些行更改为:
def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):
pkl.dump(obj, f, protocol=protocol)
2)在源文件/pandas/core/generic.py
中(修改前将原始文件复制为/pandas/core/generic.py.ori
),搜索以下行:
def to_pickle(self, path):
return to_pickle(self, path)
将这些行更改为:
def to_pickle(self, path, protocol=None):
return to_pickle(self, path, protocol)
3)如果python内核运行,请重新启动它,然后使用任何available pickle protocol(0,1,2,3,4)保存数据帧:
# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)
# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')
4)在pandas升级后,重复步骤1&amp; 2。
5)(可选)要求开发人员在正式版本中具备此功能(因为您的代码将在没有这些更改的情况下在任何其他Python环境中抛出异常)
美好的一天!
答案 2 :(得分:1)
您可以覆盖pickle包可用的最高协议:
import pickle as pkl
import pandas as pd
if __name__ == '__main__':
# this constant is defined in pickle.py in the pickle package:"
pkl.HIGHEST_PROTOCOL = 2
# 'foo.pkl' was saved in pickle protocol 4
df = pd.read_pickle(r"C:\temp\foo.pkl")
# 'foo_protocol_2' will be saved in pickle protocol 2
# and can be read in pandas with Python 2
df.to_pickle(r"C:\temp\foo_protocol_2.pkl")
这绝对不是一个优雅的解决方案,但它可以在不直接更改pandas代码的情况下完成工作。
更新:我发现新版本的pandas允许在.to_pickle
函数中指定pickle版本:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1]
DataFrame.to_pickle(path, compression='infer', protocol=4)