我正在尝试在 Kaggle 泰坦尼克号数据库上运行 DecisionTreeClassifier。 (https://www.kaggle.com/rahulsah06/titanic?select=train.csv)
这是我的代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
titanic_file_path = '../input/titanic/train.csv'
titanic_data = pd.read_csv(titanic_file_path)
#I create X and y
features= ['Pclass', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
X= titanic_data[features]
y = titanic_data.Survived
#Split into validation and training data
train_X, val_x, train_y, val_y = train_test_split(X,y, random_state=1)
#model definition and fit
titanic_model = DecisionTreeClassifier(random_state=1)
titanic_model.fit(train_X, train_y)
但是当我运行代码时出现错误:
could not convert string to float: 'female'
如何解决这个问题?
答案 0 :(得分:1)
快速解决方法是使用 get_dummies 方法将您的列转换为分类值。
X = pd.get_dummies(X)
尽管您可能应该采取比目前更多的预处理步骤。但是对于玩具跑,我想得到假人就足够了。