Question

我一直在使用Pandas一段时间，但我是HDF5的新手所以我正在尝试学习它并将我的一些研究数据文件转换为HDF5文件。我查看了一些关于python和HDF5的SO帖子，我对使用BLOSC压缩算法感兴趣（我们对数据集进行了大量计算，因此读/写速度比存储大小更高优先级）。

在使用pandas.to_hdf时，我遇到了blosc压缩库的问题。当我使用blosc，python崩溃时，当我在Visual Studio 2010中打开调试时，我得到了

python.exe中0x00007ffcd59fa28c处的未处理异常：0xC0000374：堆已损坏。

我在脚本中设置了一个单独的示例并得到了同样的问题：

import pandas as pd

test = pd.DataFrame()
test['random1'] = np.random.randn(1000000)
test['random2'] = np.random.randn(1000000)
test['random3'] = np.random.randn(1000000)

# Write out a csv first to compare file sizes
test.to_csv('./examples/data/random_3c.csv')

# Write out using different compression algorithms to compare
test.to_hdf('./examples/data/random_3c_zlib.h5',
            key='Random_3Col', mode='w', format='table', 
            append=False, complevel=9, complib='zlib', fletcher32=True)

test.to_hdf('./examples/data/random_3c_blosc.h5',
            key='Random_3Col', mode='w', format='table', 
            append=False, complevel=9, complib='blosc', fletcher32=True)

csv写得很好（文件大小为65,217 kb）
zlib压缩写得很好（文件大小为21,719 kb）
blosc压缩崩溃内核，当我在VS中打开调试时，我收到堆损坏消息我的熊猫版本是0.16.2
我的PyTables版本是3.2.0
我也从hdfgroup安装了hdf5 我正在使用Windows机器

此时我甚至不确定如何开始追踪导致崩溃的原因。有什么建议或之前有人见过吗？当我尝试使用外部blosc库时，我发现有些人在SO上遇到问题，但我还没有接触到它。我想我会先掌握基础知识！据我所知，大熊猫正在使用捆绑了blosc版本的pytables。

谢谢！

Answer 1

如果您使用的是anaconda发行版，那么这是一个包构建问题：Pytables 3.2, python 3.4 under windows x64 · Issue #458 · ContinuumIO/anaconda-issues。您可以观看并等待修复。

在pandas中使用blosc压缩导致堆损坏

1 个答案: