Question

我正在使用Kaggle的Titanic数据集。我正在尝试可视化所有类别变量之间的相关性。为此，我将点图与FaceGrid一起使用。该图的某些部分与数据矛盾。

为了生成点图，我使用了

grid = sns.FacetGrid(train, row='Embarked', size=2.2, aspect=1.6)
grid.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')
grid.add_legend()

该点状图显示，几乎所有从P等级= 1和2的'C'上船的雄性都存活了下来。这极不可能。我尝试使用数据透视表进行验证：

temp = round(pd.pivot_table(train_df, values='Survived', index=['Embarked','Sex'], columns='Pclass', aggfunc=[lambda x: len(x), np.sum,'mean'] ),2)

第1类的42 第2类的10分，以及 3班的43人从“ C”级出发。

但是只有17、2、10人幸存。因此，平均概率应为0.40、0.20和0.23。

我认为主要问题是它产生的警告：

UserWarning: Using the pointplot function without specifying `order` is likely to produce an incorrect plot.
UserWarning: Using the pointplot function without specifying `hue_order` is likely to produce an incorrect plot.

我不理解 Order 的含义。附带说明：我如何对Embarked索引进行排序以说出“ S”，“ C”和“ Q”。

人脸网格和点状图无法提供准确的结果

0 个答案: