我是scikit的新手,在尝试使用采样训练集学习者时,我得到了索引越界错误
这里是发生错误的地方
def train_predict(learner, sample_size, X_train, y_train, X_test, y_test):
results = {}
start = time() # Get start time
learner.fit(X_train[sample_size],y_train[sample_size])
end = time() # Get end time
results['train_time'] = end-start
start = time() # Get start time
predictions_test = learner.predict(X_test)
predictions_train = learner.predict(X_train.head(300))
end = time() # Get end time
results['pred_time'] = end-start
results['acc_train'] = accuracy_score(y_train.head(300),predictions_train)
results['acc_test'] = accuracy_score(y_test,predictions_test)
results['f_train'] = f_score(y_train.head(300),predictions_train)
results['f_test'] = f_score(y_test,predictions_test)
print "{} trained on {} samples.".format(learner.__class__.__name__, sample_size)
return results
这是主要代码
clf_A = GaussianNB()
clf_B = tree.DecisionTreeClassifier()
clf_C = SVC()
samples_1 = random.sample(X_train.index,len(X_train)/100)
samples_10 = random.sample(X_train.index,len(X_train)/10)
samples_100 = X_train.index
results = {}
for clf in [clf_A, clf_B, clf_C]:
clf_name = clf.__class__.__name__
results[clf_name] = {}
for i, samples in enumerate([samples_1, samples_10, samples_100]):
results[clf_name][i] = \
train_predict(clf, samples, X_train, y_train, X_test, y_test)
vs.evaluate(results, accuracy, fscore)
错误在线
---> 21 learner.fit(X_train[sample_size],y_train[sample_size])
它说
IndexError: indices are out-of-bounds
答案 0 :(得分:1)
您的错误完全取决于X_train和y_train的样子。
可能适合您情况的常见示例: 如果这些是pandas数据框对象,那么修复解决方案可能就像添加.as_matrix()一样简单:
learner.fit(X_train.as_matrix()[sample_size],y_train.as_matrix()[sample_size])
您可以检查的另一个快速事项是X_train [sample_size]返回的行数和y_train [sample_size]返回的行数是相同的。请注意,这与不与以下评估为true相同,因为X_train [sample_size]可以包含比y_train [sample_size]更多的列:
len(X_train[sample_size]) == len(y_train[sample_size])
在您的问题中提供有关如何构建X_train和y_train的信息或有关其类型和形状的详细信息,可以为您提供更具体的答案。
答案 1 :(得分:0)
尝试以下
learner.fit(X_train[**0:sample_size**],y_train[**0:sample_size**])