问题:
sklearn允许创建用户定义的距离函数,用于多种算法(例如KNN)。但是,它通过在页面末尾创建a random numpy array(__init__
class PyFuncDistance(DistanceMetric)
来测试用户定义的函数。我的函数是为分类变量定义的,并且为了加快计算,我将字典传递给我提前构建的距离函数。当然,当sklearn通过float数组进行测试时,它会引发KeyError,因为字典只有属性值作为键。
代码:
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier
from sklearn import cross_validation
df = pd.DataFrame(np.random.choice(["a", "b", "c", "d"], (200, 4)))
for col in df:
le = preprocessing.LabelEncoder()
le.fit(df[col])
df[col] = le.transform(df[col])
value_dict = df[0].value_counts().to_dict()
def custom_distance(point1, point2, value_dict):
#this is not the actual distance function, just a simplified version for reproducibility
distance = .0
for i in range(1, len(point1)+1):
distance += abs(value_dict[point1[i]] - value_dict[point2[i]])
return distance
neigh_custom = KNeighborsClassifier(n_neighbors=10, metric=custom_distance,
metric_params = {"value_dict": value_dict})
scores = cross_validation.cross_val_score(neigh_custom, df.ix[:,1:], df.ix[:,0], cv=10)
问题:
为了确保错误不是由原始数据引起的,而是由测试引起的,只有__init__
PyFuncDistance
aa -- bb -- cc -- dd
引发该异常,才能捕获该异常吗?目前我正在检查数字是否在0到1之间,以了解它是否是随机生成的,但我不认为这是一个好习惯。
答案 0 :(得分:1)
import traceback
import sys
try:
scores = cross_validation.cross_val_score(neigh_custom, df.ix[:,1:], df.ix[:,0], cv=10)
except Exception, err:
exc_type, exc_value, exc_traceback = sys.exc_info()
sam = traceback.format_exception(exc_type, exc_value,
exc_traceback)
if 'PyFuncDistance.__init__' in sam[-3]:
print 'I knew it'
如果您想针对其他问题提出异常,可以使用' raise'并使用sam打印回溯调用问题
希望这有帮助!