从不同的脚本中读回HDF5文件

时间:2016-04-20 07:35:42

标签: python pandas hdf5

我正在尝试以HDF5格式学习pandas DataFrames的存储。

我在Python 3.4.3上使用pandas 0.17.1

在同一目录中的两个Jupiter笔记本中,hdf5_write.ipynb我有:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(100,120).reshape(5,4),index=[10,11,12,13,14],columns=[20,21,22,23]) 
df

enter image description here

storew = pd.HDFStore('test.h5')
storew['df'] = df
storew

enter image description here

storer = pd.HDFStore('test.h5')
df2 = storer['df']
df2

enter image description here

但是在脚本hdf5_read.ipynb中我无法读回hdf5文件!:

import pandas as pd
import numpy as np
storer = pd.HDFStore('test.h5')
storer

out

磁盘上的文件存在,但它是空的。它在我的主目录下,所以我不认为它是权限问题。此外,在导致此测试的实际情况中,我有一个大的DataFrame,文件不为空,但当我尝试从其他脚本读取时,我仍然有HDF5ExtError错误: out

我已经看过类似的SO question,但这似乎取决于更老的熊猫,而我的安装更新。 pd.show_version()返回:

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-83-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: None
numpy: 1.10.4
scipy: 0.13.3
statsmodels: None
IPython: 4.0.0
sphinx: 1.2.2
patsy: None
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.1.1
numexpr: 2.5
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

1 个答案:

答案 0 :(得分:-1)

为什么不将它们存储为csv?

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(100,120).reshape(5,4),index=[10,11,12,13,14],columns=[20,21,22,23]) 
df.to_csv('test.csv')

然后在你的第二本笔记本中:

import pandas as pd
import numpy as np
df = pd.read_csv('test.csv')