Question

我有一个信用评分数据集，需要对客户是否会违约进行分类。

LIMIT_BAL  gender EDUCATION MARRIAGE    AGE SEP_STATUS  AUG_STATUS  JUL_STATUS  JUN_STATUS  MAY_STATUS  ... JUN_BAL MAY_BAL APR_BAL SEP_PAID    AUG_PAID    JUL_PAID    JUN_PAID    MAY_PAID    APR_PAID    default_0
0   20000   female  bachelor    married 24  2 mo    2 mo    paid    paid    no need to pay  ... 0   0   0   0   689 0   0   0   0   bad
1   90000   female  bachelor    single  34  using credit    using credit    using credit    using credit    using credit    ... 14331   14948   15549   1518    1500    1000    1000    1000    5000    good

dec_class= DecisionTreeClassifier(random_state=17)
y = df['default_0']
x = df.iloc[:, :-1]

X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=17)

dec_class.fit(x,y)

could not convert string to float: 'female'

我认为决策树在分类和数值特征上都可以很好地工作。我已经将分类特征预处理为单词，之前它们都是数字。为什么不接受与词相同的分类特征：性别-“男”，“女”？

决策树分类器不接受分类特征

0 个答案: