我的代码(来自书籍Python Data Science Handbook(O' Reilly)):
完全披露:在撰写本文时,该书仍处于提前发布状态,这意味着该书仍然未经编辑且处于原始状态。
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.pivot_table('survived', index='sex', columns='class')
结果是:
但是,如果我现在尝试使用margins
关键字添加总计,则会出现以下错误:
titanic.pivot_table('survived', index='sex', columns='class', margins=True)
TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
知道可能导致这种情况的原因吗?
版本信息:
答案 0 :(得分:5)
这似乎是由于大熊猫0.15和0.16之间的变化。在以前的版本中,titanic数据集有这个dtype:
In [1]: import pandas, seaborn
In [2]: pandas.__version__
'0.15.2'
In [3]: titanic = seaborn.load_dataset('titanic')
In [4]: titanic.dtypes
Out[4]:
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class object
who object
adult_male bool
deck object
embark_town object
alive object
alone bool
dtype: object
使用较新的熊猫:
In [1]: import pandas, seaborn
In [2]: pandas.__version__
'0.16.2'
In [3]: titanic = seaborn.load_dataset('titanic')
In [4]: titanic.dtypes
Out[4]:
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
多个列会自动转换为分类,从而引发此错误。该书目前尚未发表,未经编辑;我一定要测试最新版本并在发布之前修复这些类型的错误。
目前,这是一种解决方法:
In [5]: titanic['class'] = titanic['class'].astype(object)
In [6]: titanic.pivot_table('survived', index='sex', columns='class', margins=True)
Out[6]:
class First Second Third All
sex
female 0.968085 0.921053 0.500000 0.742038
male 0.368852 0.157407 0.135447 0.188908
All 0.629630 0.472826 0.242363 0.383838
编辑:我将此作为问题提交给pandas项目:https://github.com/pydata/pandas/issues/10989