使用熊猫的cut函数后,Sckit-Learn的train_test_split函数会出现“意外的关键字参数'axis'”

时间:2019-04-27 22:26:50

标签: python pandas tensorflow scikit-learn

我遇到了对数据集进行操作的问题。我的数据集为CSV格式,并具有以下结构:

ID,FieldOne,FieldTwo,FieldThree,FieldFour,FieldThree,FieldFour,FieldFive,ToPredict 
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4,4.5,8.87,1,0.76
3,316,104,3,3,3.5,8,1,0.72

“ ToPredictField”是一种概率,它告诉我为某个过程选择每一行的可能性。那是我的课程专栏,我想将其分为5类:Very_unlikely(<= 0.5),不太可能(0.5和0.7之间),Medium(0.7和0.8之间),可能性(0.8和0.9之间),Very_likey(> 0.9) )。我是通过使用如下的Pandas cut函数来做到这一点的:

bins = [0, 0.5, 0.7, 0.8, 0.9, 1]
names = ['Very_unlikely', 'Unlikely', 'Medium', 'Likely', 'Very_likely']
dataset['ToPredictField'] = pd.cut(dataset['Chance of Admit '], bins, labels=names)

现在,我尝试运行train_test_split将数据集拆分为67%的火车/ 33%的火车:

data_X = dataset[['ID','FieldOne','FieldTwo','FieldThree','FieldFour','FieldThree','FieldFour','FieldFive']].values
data_Y = dataset['Chance of Admit '].values

train_X, test_X, train_Y, test_Y = train_test_split(data_X, data_Y, test_size=0.33, random_state=10)

但是,出现此错误:

/usr/local/lib/python3.6/dist-packages/sklearn/utils/__init__.py in safe_indexing(X, indices)
    214                                    indices.dtype.kind == 'i'):
    215             # This is often substantially faster than X[indices]
--> 216             return X.take(indices, axis=0)
    217         else:
    218             return X[indices]

TypeError: take_nd() got an unexpected keyword argument 'axis'

你知道这可能是什么吗?

谢谢。

1 个答案:

答案 0 :(得分:1)

我在熊猫0.24.2上确认了这个问题。要解决此问题,请更改

 data_Y = dataset.ToPredictField.cat.codes

这将为您提供类别的数字代码,它肯定可以与sklearn配合使用。或者你可以简单地做

 data_Y = dataset.ToPredictField

但是我不确定sklearn会怎样。