在循环中,我使用loc [(x,y)]填充带有元素的多索引数据帧以进行索引。虽然这适用于几千个元素,但我突然得到了一个关键错误。
# setupt of empty dataframe with multi-index
df = pd.DataFrame()
df.index = pd.MultiIndex.from_tuples((), names=['cycle','itemNr'])
# adding data elements using loc in a loop like this
df.loc[ (currentCycle, currentItem) , 'a' ] = 99
添加10000行后,每行有15列,我收到一个键错误。
我使用错误的索引方法吗?
我是否需要确保排序的多索引?
是否有尺寸限制?
df.shape
OUT: (10000, 15)
df.loc[(282,1), 'columnA'] = 99
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-40-0a92a62250dd> in <module>()
----> 1 df.loc[(282,1), 'columnA'] = 99
C:\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
192 key = com._apply_if_callable(key, self.obj)
193 indexer = self._get_setitem_indexer(key)
--> 194 self._setitem_with_indexer(indexer, value)
195
196 def _has_valid_type(self, k, axis):
C:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
373 self.obj.is_copy = None
374
--> 375 nindexer.append(labels.get_loc(key))
376
377 else:
C:\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in get_loc(self, key, method)
2089 key = _values_from_object(key)
2090 key = tuple(map(_maybe_str_to_time_stamp, key, self.levels))
-> 2091 return self._engine.get_loc(key)
2092
2093 # -- partial selection or non-unique index
pandas\_libs\index.pyx in pandas._libs.index.MultiIndexHashEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.MultiIndexHashEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.MultiIndexHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.MultiIndexHashTable.get_item()
KeyError: (282, 1)
如何重现:
CVS文件reproduce_loc_assignment_error.csv http://s000.tinyupload.com/index.php?file_id=61488607596539177759
该文件有10000行,只有3列(2个索引列和1个数据列a)
df = pd.read_csv('reproduce_loc_assign_error.csv', index_col=[0,1])
df.loc[(282,2),'a'] = 100 #( gives key error)