大家好,并提前致谢。
我正在尝试定期将财务数据存储到数据库以供以后查询。我正在使用Pandas进行几乎所有的数据编码。我想将我创建的数据帧附加到HDF数据库中。我将csv读入数据帧并按时间戳索引。和DataFrame看起来像:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 900 entries, 1378400701110 to 1378410270251
Data columns (total 23 columns):
....
...Columns with numbers of non-null values....
.....
dtypes: float64(19), int64(4)
store = pd.HDFStore('store1.h5')
store.append('df', df)
print store
<class 'pandas.io.pytables.HDFStore'>
File path: store1.h5
/df frame_table (typ->appendable,nrows->900,ncols->23,indexers->[index])
但是当我尝试对商店做任何事情时,
print store['df']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 289, in __getitem__
return self.get(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 422, in get
return self._read_group(group)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 930, in _read_group
return s.read(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 3175, in read
mgr = BlockManager([block], [cols_, index_])
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 1007, in __init__
self._set_ref_locs(do_refs=True)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 1117, in _set_ref_locs
"does not have _ref_locs set" % (block,labels))
AssertionError: cannot create BlockManager._ref_locs because block
[FloatBlock: [LastTrade, Bid1, Bid1Volume,....., Ask5Volume], 19 x 900, dtype float64]
with duplicate items
[Index([u'LastTrade', u'Bid1', u'Bid1Volume',..., u'Ask5Volume'], dtype=object)]
does not have _ref_locs set
我想我的索引做错了,我对此很新,并且知之甚少。
编辑:
数据框架结构如下:
columns = ['TimeStamp', 'LastTrade', 'Bid1', 'Bid1Volume', 'Bid1', 'Bid1Volume', 'Bid2', 'Bid2Volume', 'Bid3', 'Bid3Volume', 'Bid4', 'Bid4Volume',
'Bid5', 'Bid5Volume', 'Ask1', 'Ask1Volume', 'Ask2', 'Ask2Volume', 'Ask3', 'Ask3Volume', 'Ask4', 'Ask4Volume', 'Ask5', 'Ask5Volume']
df = pd.read_csv('/20130905.csv', names=columns, index_col=[0])
df.head()看起来像:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 1378400701110 to 1378400703105
Data columns (total 21 columns):
LastTrade 5 non-null values
Bid1 5 non-null values
Bid1Volume 5 non-null values
Bid1 5 non-null values
.................values
Ask4 5 non-null values
Ask4Volume 5 non-null values
dtypes: float64(17), int64(4)
打印内容的列太多了。但是例如:
print df['LastTrade'].iloc[10]
LastTrade 1.31202
Name: 1378400706093, dtype: float64
和熊猫版:
>>> pd.__version__
'0.12.0'
任何想法都会非常感激,再次感谢你。
答案 0 :(得分:0)
您确实有重复的“Bid1”和“Bid1Volume”列吗?
不相关,但您还应将索引设置为日期时间索引
import pandas as pd
df.index = pd.to_datetime(df.index,unit='ms')
这是一个错误,因为重复列交叉dtypes(不是什么大问题 但未被发现。)
最容易就是没有重复的列。
将在0.13中修复,请参见此处:https://github.com/pydata/pandas/pull/4768