ValueError:无法将字符串转换为float:'critical'

时间:2016-09-22 20:51:44

标签: python machine-learning scikit-learn decision-tree

import pandas as pd
from sklearn import  cross_validation
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

names = ['dyastolic blood pressure','heart rate','pulse oximetry', 'respiratory rate','systolic blood pressure', 'temperature', 'class']
data = pd.read_csv("vitalsign1.csv", names = names)
array = data.values

X = array[:,0:6]

y = array[:,6]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,y,test_size=.5)

estimator = DecisionTreeRegressor(criterion = "mse", max_leaf_nodes = 6)
estimator.fit(X_train,y_train)
y_pred = estimator.predict(X_test)
score = mean_squared_error(y_test,y_pred)

对于上面的代码,我收到一个值错误:

ValueError: could not convert a string to float: 'critical'

我明白在我的“课程”专栏中我有两个名为criticalexcellent的课程,而应该有01等数字。但我想保持课堂原样。

如何解决这个问题?我的数据如下:

115 77  99  18  148 35  critical
99  61  97  14  147 37  excellent

1 个答案:

答案 0 :(得分:0)

pandas.read_csv添加转换器,将符号criticalexcellent分别转换为01

如下所示:

>>> data = pd.read_csv(
    "vitalsign1.csv", names=names,
    converters={
        "class": lambda x: dict(critical=0, excellent=1)[x]
        }
    )

这将产生如下数据集:

>>> data
   dyastolic blood pressure  heart rate  pulse oximetry  respiratory rate  systolic blood pressure  temperature  class
0                       115          77              99                18                      148           35      0
1                        99          61              97                14                      147           37      1