替换嵌套循环

时间:2018-08-17 19:47:52

标签: python python-3.x numpy machine-learning

我刚开始使用Python,但在理解如何实现以下目标(我是Java程序员)方面遇到困难。

这是初始代码:

  def compute_distances_two_loops(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a nested loop over both the training data and the 
    test data.

    Inputs:
    - X: A numpy array of shape (num_test, D) containing test data.

    Returns:
    - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
      is the Euclidean distance between the ith test point and the jth training
      point.
    """

    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))

    for i in range(num_test):
      for j in range(num_train):
        #####################################################################
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        #####################################################################
        dists[i, j] = np.sum(np.square(X[i] - self.X_train[j]))
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
    return dists

这是一段应该少嵌套的循环,同时仍输出相同数组的代码:

  def compute_distances_one_loop(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a single loop over the test data.

    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))

    for i in range(num_test):
      tmp = '%s %d' % ("\nfor i:", i)
      print(tmp)

      print(X[i])
      print("end of X[i]")
      print(self.X_train[:]) # all the thing [[ ... ... ]]
      print(": before, i after")
      print(self.X_train[i]) # just a row
      print(self.X_train[i, :])

      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      dists[i, :] = np.sum(np.square(X[i] - self.X_train[i, :]))
      print(dists[i])
      #######################################################################
      #                         END OF YOUR CODE                            #
      #######################################################################
    return dists

this似乎应该对我有所帮助,但我仍然想不通。

您可以看到,除其他外,我的陷阱是我对“:”的确切工作方式缺乏理解。

我花了数小时试图弄清楚这个问题,但似乎我真的缺乏一些核心知识。有人可以帮我吗?此练习是针对斯坦福大学视觉识别课程进行的:这是第一个作业,但它并不是我的真正作业,因为我只是为了娱乐而独自学习。

当前,我的代码段输出two_loops的对角线的正确值,但对于整个行。我不知道如何将:中的dists[i, :]- self.X_train[i, :]部分进行同步。如何计算X [i]减去贯穿整个self.X_train的迭代次数?

注意num_test是500x3072,而num_train是5000x3072。 3072来自32x32x3,这是32x32图片的RGB值。 dists[i,j]是一个500x5000的矩阵,它映射num_test的第i个元素和num_train的第j个元素之间的L2距离。

1 个答案:

答案 0 :(得分:2)

def compute_distances_one_loop(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a single loop over the test data.

    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))

    for i in range(num_test):
      tmp = '%s %d' % ("\nfor i:", i)
      print(tmp)

      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      dists[i] = np.sum(np.square(X[i] - self.X_train), axis=1)
      print(dists[i])
      #######################################################################
      #                         END OF YOUR CODE                            #
      #######################################################################
    return dists

因为循环长度不同,所以在循环中使用self.X_train删除打印。 (IndexOutOfRangeException) 我不确定这是否要删除第二个循环,但是否可行。

另一条评论,我认为您对欧氏距离公式有误。 您最后缺少了sqrt。