我有一个大的SparseDataFrame(比如,20k索引x 10k列),密度非常低(设置了0.1%的条目。)我正在尝试访问特定的行数据框,但我似乎无法做到这一点。访问列很好。这是一个说明问题的小例子:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(15).reshape(5,3), index=list('abcde'))
df.loc['b',1] = np.nan # for good measure...
sparse = df.to_sparse()
sparse[1] # This is OK.
df.loc['b'] # This is also OK.
sparse.loc['b'] # This blows up.
这是追溯:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 1020, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 1145, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 68, in _get_label
return self.obj._xs(label, axis=axis, copy=True)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/frame.py", line 2149, in xs
new_values, copy = self._data.fast_2d_xs(loc, copy=copy)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/internals.py", line 2714, in fast_2d_xs
result[i] = blk._try_coerce_result(blk.iget((j, loc)))
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/internals.py", line 275, in iget
return self.values[i]
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/sparse/array.py", line 286, in __getitem__
data_slice = self.values[key]
IndexError: too many indices
请注意,在“普通”,密集的DataFrame对象上,它运行良好。但是,由于我的尺寸很大,对我来说是一个很大的不便:
我对熊猫比较新,所以也许我错过了什么。在任何情况下,任何帮助表示赞赏!
答案 0 :(得分:0)
根据@ Jeff对我的问题的评论,似乎目前Pandas的稀疏数据结构并未完全支持索引。希望这会很快改变!
GitHub上的相关问题: