Numpy:vstack会自动检测索引超出范围并纠正吗?

时间:2016-06-03 17:37:32

标签: python numpy

我很困惑为什么在下面的代码中(我标记为“HERE”的部分)会起作用,因为j + 1会使列表列表(这是X_train_folds)超出范围时j到达范围的末尾。为什么这甚至会起作用?是因为vstack可以自动检测到这种变化吗?我找不到任何文件。

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)

# print y_train_folds

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.
k_to_accuracies = {}

################################################################################
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################

for k in k_choices:
    k_to_accuracies[k] = []

for k in k_choices:
    print 'evaluating k=%d' % k
    for j in range(num_folds):
        X_train_cv = np.vstack(X_train_folds[0:j]+X_train_folds[j+1:])#<--------------HERE
        X_test_cv = X_train_folds[j]

        #print len(y_train_folds), y_train_folds[0].shape

        y_train_cv = np.hstack(y_train_folds[0:j]+y_train_folds[j+1:]) #<----------------HERE
        y_test_cv = y_train_folds[j]

        #print 'Training data shape: ', X_train_cv.shape
        #print 'Training labels shape: ', y_train_cv.shape
        #print 'Test data shape: ', X_test_cv.shape
        #print 'Test labels shape: ', y_test_cv.shape

        classifier.train(X_train_cv, y_train_cv)
        dists_cv = classifier.compute_distances_no_loops(X_test_cv)
        #print 'predicting now'
        y_test_pred = classifier.predict_labels(dists_cv, k)
        num_correct = np.sum(y_test_pred == y_test_cv)
        accuracy = float(num_correct) / num_test

        k_to_accuracies[k].append(accuracy)

################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print 'k = %d, accuracy = %f' % (k, accuracy)

1 个答案:

答案 0 :(得分:1)

没有。 vstack并没有造成这种情况,但强大的indexation of numpy却是如此。 numpy的内部结构很复杂,有时会返回副本,有时会返回view。但是,在这两种情况下,您都在启动方法。当索引本身为空(在数组空间之外)时,此方法尤其会返回empty array

请参阅以下示例和相应的输出(在print中):

import numpy as np

a = np.array([1, 2, 3])
print(a[10:]) # This will return empty
print(a[10]) # This is an error

,结果是:

  

[]

     

回溯(最近一次调用最后一次):文件&#34; C:/Users/imactuallyavegetable/temp.py",第333行,          print(a [10])IndexError:索引10超出轴0的大小为3

首先是一个空数组,第二个是异常。