Question

我正在处理来自[ seaborn ]的数据集 titanic 。

titanic = seaborn.load_dataset('titanic')

我将age列划分为分类箱。

age = pd.cut(titanic['age'], [0, 18, 80])

然后问题来了，groupby和pivot_table给出了完全不同的结果：

titanic.groupby(['sex', age, 'class'])['survived'].mean().unstack(-1)
titanic.pivot_table('survived', ['sex', age], 'class')

groupby and pivot_table results

起初，我想这是因为年龄中的nan，然后我使用了dropna处理的数据集来重做它。

titanic = titanic.dropna()
age = pd.cut(titanic['age'], [0, 18, 80], right = True)
titanic.groupby(['sex', age, 'class'])['survived'].mean().unstack(-1)
titanic.pivot_table('survived', ['sex', age], 'class')

这次我什至得到了完全不同的结果。

groupby and pivot_table results after dropna

我的python版本是：Python 3.6.5 :: Anaconda，Inc. 熊猫：0.23.0

我的操作系统是MaxOS High Sierra 10.13.6

我再次尝试使用python 3.7.0和pandas 0.23.4，并且没有发生错误。

result under python 3..7.0

所以我想知道这是否是Anaconda的错误？

Answer 1

我尝试了您的陈述，并得到了匹配的结果： enter image description here

Answer 2

我发现这是熊猫的一个bug，它出现在2018年5月发布的0.23.0版本中，并在2018年9月发布的0.23.4版本中得到了解决。

因此，如果碰巧遇到了有关pandas.pivot_table的问题，尤其是当分类数据中存在NaN时，最好先检查一下熊猫的版本并进行升级。：）

Python：pivot_table和groupby得到完全相反的结果

2 个答案: