在执行dacate categorize()方法后重置index()错误

时间:2019-03-24 01:19:40

标签: python pandas dataframe dask

我有一只熊猫dataframe,是在做crosstab之后构造的。当我尝试reset_index()时,出现错误。说TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category。有时会发生这种情况,有时不会发生。在使用dataframe方法对某些变量进行分类之后,使用dask .compute()构造了熊猫.categorize()

任何人都可以告诉我问题是什么以及如何纠正它。我正在运行熊猫0.24.2。

编辑: 请在下面找到代码。

years = ['2014','2015','2016','2017','2018']
data = {'years' : years,
        'F'   : ['A', 'B', 'B', 'C', 'C'],
        'M'   : ['A', 'A', 'A', 'B', 'C']}

data = pd.DataFrame.from_dict(data)
data  = dd.from_pandas(data,npartitions =2)
data = data.categorize(columns = ['M','F'])
CAM = data['M'].compute()
G = data['F'].compute()
mg = pd.crosstab(G,CAM)
mg = mg.reset_index()

错误在下面:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-80-049481fd2564> in <module>
----> 1 mg = mg.reset_index()
      2 mg

~/env/lib/python3.5/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   4429                 # to ndarray and maybe infer different dtype
   4430                 level_values = _maybe_casted_values(lev, lab)
-> 4431                 new_obj.insert(0, name, level_values)
   4432 
   4433         new_obj.index = new_index

~/env/lib/python3.5/site-packages/pandas/core/frame.py in insert(self, loc, column, value, allow_duplicates)
   3471         value = self._sanitize_column(column, value, broadcast=False)
   3472         self._data.insert(loc, column, value,
-> 3473                           allow_duplicates=allow_duplicates)
   3474 
   3475     def assign(self, **kwargs):

~/env/lib/python3.5/site-packages/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
   1153 
   1154         # insert to the axis; this could possibly raise a TypeError
-> 1155         new_axis = self.items.insert(loc, item)
   1156 
   1157         block = make_block(values=value, ndim=self.ndim,

~/env/lib/python3.5/site-packages/pandas/core/indexes/category.py in insert(self, loc, item)
    765         code = self.categories.get_indexer([item])
    766         if (code == -1) and not (is_scalar(item) and isna(item)):
--> 767             raise TypeError("cannot insert an item into a CategoricalIndex "
    768                             "that is not already an existing category")
    769 

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

希望这会有所帮助。

谢谢

迈克尔

0 个答案:

没有答案