import pandas as pd
from sklearn import cross_validation
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
names = ['dyastolic blood pressure','heart rate','pulse oximetry', 'respiratory rate','systolic blood pressure', 'temperature', 'class']
data = pd.read_csv("vitalsign1.csv", names = names)
array = data.values
X = array[:,0:6]
y = array[:,6]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,y,test_size=.5)
estimator = DecisionTreeRegressor(criterion = "mse", max_leaf_nodes = 6)
estimator.fit(X_train,y_train)
y_pred = estimator.predict(X_test)
score = mean_squared_error(y_test,y_pred)
对于上面的代码,我收到一个值错误:
ValueError: could not convert a string to float: 'critical'
我明白在我的“课程”专栏中我有两个名为critical
和excellent
的课程,而应该有0
和1
等数字。但我想保持课堂原样。
如何解决这个问题?我的数据如下:
115 77 99 18 148 35 critical
99 61 97 14 147 37 excellent
答案 0 :(得分:0)
向pandas.read_csv
添加转换器,将符号critical
和excellent
分别转换为0
和1
。
如下所示:
>>> data = pd.read_csv(
"vitalsign1.csv", names=names,
converters={
"class": lambda x: dict(critical=0, excellent=1)[x]
}
)
这将产生如下数据集:
>>> data
dyastolic blood pressure heart rate pulse oximetry respiratory rate systolic blood pressure temperature class
0 115 77 99 18 148 35 0
1 99 61 97 14 147 37 1