无法设置pandas数据框的索引 - 获取“KeyError”

时间:2016-07-17 12:27:17

标签: python pandas dataframe set row

我生成一个如下所示的数据框(summaryDF):

   accuracy        f1  precision    recall
0     0.494  0.722433   0.722433  0.722433
0     0.290  0.826087   0.826087  0.826087
0     0.274  0.629630   0.629630  0.629630
0     0.278  0.628571   0.628571  0.628571
0     0.288  0.718750   0.718750  0.718750
0     0.740  0.740000   0.740000  0.740000
0     0.698  0.765133   0.765133  0.765133
0     0.582  0.778547   0.778547  0.778547
0     0.682  0.748235   0.748235  0.748235
0     0.574  0.767918   0.767918  0.767918
0     0.398  0.711656   0.711656  0.711656
0     0.530  0.780083   0.780083  0.780083

因为我知道这行中的每一行应该是什么,然后我使用这段代码来设置每一行的名称(这些不是实际的行名,只是出于参数的缘故)。

summaryDF = summaryDF.set_index(['A','B','C', 'D','E','F','G','H','I','J','K','L'])

然而,我得到了:

level = frame[col].values
  File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'A'

我不知道我做错了什么,并且进行了广泛的研究。有什么想法吗?

2 个答案:

答案 0 :(得分:5)

我猜你和@jezrael误解了熊猫文档中的一个例子:

A

BIn [55]: df = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list('ABCD')) In [56]: df Out[56]: A B C D 0 6 9 7 4 1 5 1 3 4 2 4 4 0 5 3 9 0 9 8 4 6 4 5 7 In [57]: df.set_index(['A','B']) Out[57]: C D A B 6 9 7 4 5 1 3 4 4 4 0 5 9 0 9 8 6 4 5 7 是此示例中的列名称/标签:

In [58]: df.set_index([['A','B','C','D','E']])
Out[58]:
   A  B  C  D
A  6  9  7  4
B  5  1  3  4
C  4  4  0  5
D  9  0  9  8
E  6  4  5  7

documentation表示列标签/ 数组 列表

所以你在寻找:

df.index = ['A','B',...]

但是@jezrael建议/** * Unlike {@link FloatingActionButton#show()} animates button even it not currently * laid out * @param fab fab to show */ @SuppressLint("NewApi") public static void show(FloatingActionButton fab) { if (ViewCompat.isLaidOut(fab) || Build.VERSION.SDK_INT < Build.VERSION_CODES.ICE_CREAM_SANDWICH) { fab.show(); } else { fab.animate().cancel();//cancel all animations fab.setScaleX(0f); fab.setScaleY(0f); fab.setAlpha(0f); fab.setVisibility(View.VISIBLE); //values from support lib source code fab.animate().setDuration(200).scaleX(1).scaleY(1).alpha(1) .setInterpolator(new LinearOutSlowInInterpolator()); } } 是更快更惯用的方法......

答案 1 :(得分:2)

如果list的{​​{1}}与summaryDF.index的{​​{1}}相同,则需要将length分配给list

length

<强>计时

DataFrame

另一种解决方案是将summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L'] print (summaryDF) accuracy f1 precision recall A 0.494 0.722433 0.722433 0.722433 B 0.290 0.826087 0.826087 0.826087 C 0.274 0.629630 0.629630 0.629630 D 0.278 0.628571 0.628571 0.628571 E 0.288 0.718750 0.718750 0.718750 F 0.740 0.740000 0.740000 0.740000 G 0.698 0.765133 0.765133 0.765133 H 0.582 0.778547 0.778547 0.778547 I 0.682 0.748235 0.748235 0.748235 J 0.574 0.767918 0.767918 0.767918 K 0.398 0.711656 0.711656 0.711656 L 0.530 0.780083 0.780083 0.780083 print (summaryDF.index) Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], dtype='object') 转换为In [117]: %timeit summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L'] The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 76.2 µs per loop In [118]: %timeit summaryDF.set_index(pd.Index(['A','B','C', 'D','E','F','G','H','I','J','K','L'])) The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 227 µs per loop

list