我正在尝试以HDF5格式学习pandas DataFrames的存储。
我在Python 3.4.3上使用pandas 0.17.1
在同一目录中的两个Jupiter笔记本中,hdf5_write.ipynb
我有:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(100,120).reshape(5,4),index=[10,11,12,13,14],columns=[20,21,22,23])
df
storew = pd.HDFStore('test.h5')
storew['df'] = df
storew
storer = pd.HDFStore('test.h5')
df2 = storer['df']
df2
但是在脚本hdf5_read.ipynb
中我无法读回hdf5文件!:
import pandas as pd
import numpy as np
storer = pd.HDFStore('test.h5')
storer
磁盘上的文件存在,但它是空的。它在我的主目录下,所以我不认为它是权限问题。此外,在导致此测试的实际情况中,我有一个大的DataFrame
,文件不为空,但当我尝试从其他脚本读取时,我仍然有HDF5ExtError
错误:
我已经看过类似的SO question,但这似乎取决于更老的熊猫,而我的安装更新。 pd.show_version()返回:
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-83-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: None
numpy: 1.10.4
scipy: 0.13.3
statsmodels: None
IPython: 4.0.0
sphinx: 1.2.2
patsy: None
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.1.1
numexpr: 2.5
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None
答案 0 :(得分:-1)
为什么不将它们存储为csv?
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(100,120).reshape(5,4),index=[10,11,12,13,14],columns=[20,21,22,23])
df.to_csv('test.csv')
然后在你的第二本笔记本中:
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv')