我正在使用带有Python3.5的Decission Tree分类器,并且我的所有属性都是分类的,但它们已经散列为数字,例如:
CityId,ApplianceGroupId,BrandId,StateId,ApplianceTypeId,FCRRepair,FCRRepairFMS,AppointmentIn2Hours,RSR,hasComplaint,hasApplianceExchange,satisfied
1152,1,1,95,5,0,0,0,0,0,0,1
1152,5,1,95,4,0,0,0,0,0,0,1
1201,5,1,43,4,0,0,0,0,0,0,1
882,3,2,69,1,0,0,0,0,0,0,1
代码:
names = ['CityId', 'ApplianceGroupId', 'BrandId','StateId', 'ApplianceTypeId', 'FCRRepair', 'FCRRepairFMS', 'AppointmentIn2Hours', 'RSR', 'hasComplaint', 'hasApplianceExchange', 'satisfied']
dataframe = pandas.read_csv("data_service_onlyenum.csv", names=names, encoding='utf-8', header=1)
y = targets = labels = dataframe['satisfied'].values
columns = ['CityId', 'ApplianceGroupId', 'BrandId','StateId', 'ApplianceTypeId', 'FCRRepair', 'FCRRepairFMS', 'AppointmentIn2Hours', 'RSR', 'hasComplaint', 'hasApplianceExchange']
features = dataframe[list(columns)].values
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
X = imp.fit_transform(features)
#X
plt.plot(dataframe['CityId'].values, y)
plt.show();
clf = tree.DecisionTreeClassifier(criterion="entropy", max_depth=100)
clf = clf.fit(X, y)
我认为分类器将属性视为数字并根据它计算,什么是DecissionTreeClassifier的默认属性,还是可以通过设置一些属性来禁用数字属性?
任何帮助都将不胜感激,祝你有个美好的一天。