我想在训练有素的SVM分类器上绘制学习曲线,使用不同的 得分,并使用Leave One Group Out作为交叉验证的方法。一世 我以为我已经想通了,但两个不同的得分手 - 'f1_micro'和 '准确性' - 将产生相同的价值。我很困惑,是假设的 是这样吗?
这是我的代码(遗憾的是我无法共享数据,因为它未打开):
template<typename T, typename I>
__device__
I upper_bound_index(T const* data,
I count,
T const& value) {
I start = 0;
while( count > 0 ) {
I step = count / 2;
if( !(value < data[start + step]) ) {
start += step + 1;
count -= step + 1;
} else {
count = step;
}
}
return start;
}
__global__
void group_kernel(int numGroups,
int const* __restrict__ cumulativeGroupThreadCount,
GroupData const* __restrict__ groupData) {
int gridThreadID = blockIdx.x*blockDim.x + threadIdx.x;
int groupID = upper_bound_index(cumulativeGroupThreadCount,
numGroups,
gridThreadID);
if( groupID == numGroups ) {
// Excess threads
return;
}
int itemID = gridThreadID - (groupID > 0 ?
cumulativeGroupThreadCount[groupID-1] :
0);
GroupData data = groupData[groupID];
// ...
}
现在,来自:
from sklearn import svm
SVC_classifier_LOWO_VC0 = svm.SVC(cache_size=800, class_weight=None,
coef0=0.0, decision_function_shape=None, degree=3, gamma=0.01,
kernel='rbf', max_iter=-1, probability=False, random_state=1,
shrinking=True, tol=0.001, verbose=False)
training_data = pd.read_csv('training_data.csv')
X = training_data.drop(['Groups', 'Targets'], axis=1).values
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)
y = training_data['Targets'].values
groups = training_data["Groups"].values
Fscorer = make_scorer(f1_score, average = 'micro')
logo = LeaveOneGroupOut()
parm_range0 = np.logspace(-2, 6, 9)
train_scores0, test_scores0 = validation_curve(SVC_classifier_LOWO_VC0, X,
y, "C", parm_range0, cv =logo.split(X, y, groups=groups), scoring = Fscorer)
我明白了:
[0.20257407 0.35551122 0.40791047 0.49887676 0.5021742
0.50030438 0.49426622 0.48066419 0.4868987]0.502174200206
100.0
如果我创建一个新的分类器,但具有相同的参数,并运行 一切都和以前一样,除了得分,例如:
train_scores_mean0 = np.mean(train_scores0, axis=1)
train_scores_std0 = np.std(train_scores0, axis=1)
test_scores_mean0 = np.mean(test_scores0, axis=1)
test_scores_std0 = np.std(test_scores0, axis=1)
print test_scores_mean0
print np.amax(test_scores_mean0)
print np.logspace(-2, 6, 9)[test_scores_mean0.argmax(axis=0)]
我得到完全相同的答案:
[0.20257407 0.35551122 0.40791047 0.49887676 0.5021742
0.50030438 0.49426622 0.48066419 0.4868987]0.502174200206
100.0
这怎么可能,我做错了什么,或者错过了什么?
由于
答案 0 :(得分:1)
F1 = accuracy
当且仅当TP = TN
,即真阳性的数量等于真阴性的数量,如果你的类完全平衡,就会发生这种情况。所以就是这样,或者你的代码中有错误。你在哪里初始化你的得分手,如下:scorer = make_scorer(accuracy_score, average = 'micro')
?