Question

我们使用pandas Dataframe作为我们时间序列数据的主要数据容器。我们将数据帧打包成二进制blob到mongoDB文档中，以及关于时间序列blob的元数据的键。

当我们从pandas 0.14.1升级到0.15.2时遇到错误。

创建pandas Dataframe（0.14.1）的二进制blob

import lz4   
import cPickle

bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))

错误案例：使用pandas 0.15.2从mongoDB重新读取

cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))

成功案例：使用pandas 0.14.1从mongoDB重新读回，没有错误。

这似乎与旧的堆栈线程Pandas compiled from source: default pickle behavior changed类似来自https://stackoverflow.com/users/644898/jeff

的有用评论

您看到的错误消息`TypeError：_reconstruct：第一个参数必须是ndarray的子类型才是python默认的unpickler 确保被pickle的类层次结构恰好是同样重建的东西。由于系列版本之间已发生变化默认的unpickler不再可能这样做了（这个恕我直言是一个泡菜的工作方式中的错误）。无论如何，大熊猫会破坏具有Series对象的0.13之前的泡菜。＆＃34;

有关变通方法或解决方案的任何想法吗？

重新创建错误：

在pandas中设置0.14.1 env：

df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error

在pandas 0.15.2 env：

中创建错误

cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))

Answer 1

这被明确提及为Index类现在不再是子类ndarray，而是一个pandas对象，请参阅here。

您只需使用pd.read_pickle来阅读泡菜。

Pandas与pickle 0.14.1和0.15.2的向后兼容性问题

1 个答案: