我的数据如下:
Survived,Pclass,Name,Sex
0,3,"Braund, Mr. Owen Harris",male
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female
1,3,"Heikkinen, Miss. Laina",female
1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female
0,3,"Allen, Mr. William Henry",male
0,3,"Moran, Mr. James",male,
当我尝试比较头等舱幸存者的人数和性别时,这显示出我奇怪的结果。
当我尝试这样做
data[(data['Sex']=='female') & (data['Pclass']== 1)]['Survived'].value_counts().plot(kind='bar')
plt.legend()
plt.xticks(np.arange(2), rotation=0)
plt.title("Male and female survivors in first class")
plt.show()
这表明几乎所有头等舱女性都幸存了(这是正确的)
但是当我尝试这样做时:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
data = pd.read_csv('titanic_data/train.csv')
males = data[(data['Sex']=='male') & (data['Pclass'] == 1)]['Survived'].value_counts()
females = data[(data['Sex']=='female') & (data['Pclass']== 1)]['Survived'].value_counts()
plt.bar(range(len(females)), females, align='edge', width=-0.4, label='Female', color='red', alpha=0.5)
plt.bar(range(len(males)), males, align='edge', width=0.4, label='Male', color='blue', alpha=0.5)
plt.legend()
plt.xticks(np.arange(len(males)), rotation=0)
plt.title("Male and female survivors in first class")
plt.show()
它表明几乎所有女性都死了(那是错误的!)
答案 0 :(得分:1)
matplotlib.pyplot.bar获取条形图的x坐标,但是您提供了range(len(females))
,它只是向条形图任意分配了0.1,而无需查看输出中Survived
列的顺序value_counts()
中的。您要提供的x坐标是索引。例如:
plt.bar(females.index, females, align='edge', width=-0.4, label='Female', color='red', alpha=0.5)