我正在尝试在“年龄”列中填写Nan值,这是我想出的方法。但是,我不确定如何实现相同的功能。
例如-在第6行的“年龄”列中填写所有未幸存者的平均年龄
数据集带有标签-培训
PassengerId Survived Pclass Sex Age SibSp Parch Fare
1 0 3 2 22.0 1 0 7.2500
2 1 1 1 38.0 1 0 71.2833
3 1 3 1 26.0 0 0 7.9250
4 1 1 1 35.0 1 0 53.1000
5 0 3 2 35.0 0 0 8.0500
6 0 3 2 NaN 0 0 8.4583
我在Jupyter笔记本上干预了一些代码。我已经能够分别求出幸存者和未幸存者的年龄。我被困在最后要填写的这些数据上-在“年龄”列中-基于“幸存”列的值(0-未幸存和1-幸存)
titanic_survived = training[training["Survived"] == 1]
titanic_survived_not = training[training["Survived"] == 0]
mean_of_survived = titanic_survived.Age.mean()
mean_of_survived_not = titanic_survived_not.Age.mean()