Question

最近我遇到了一些我不太了解的Numpy练习。练习使用3D数组中的一些随机样本数据：

import numpy as np

alpha = np.full(2000, .1)
beta = np.full(100, .1)

wordsInTopic = np.random.dirichlet(alpha, 100)

produced = np.zeros((50, 100, 2000))

for doc in range(0, 50):

    topicsInDoc = np.random.dirichlet(beta)
    wordsToTopic = np.random.multinomial(2000, topicsInDoc)

    for topic in range(0, 100):
        produced[doc, topic] = np.random.multinomial(wordsToTopic[topic], wordsInTopic[topic])

例如，以下内容与预期相同：

print(produced[:, np.arange(0, 100, 1), :].shape)
print(produced[:, :, :].shape)

但以下不是：

print(produced[:, np.arange(0, 100, 1), produced.sum(0).argmax(1)].shape)
print(produced[:, :, produced.sum(0).argmax(1)].shape)

有人能解释一下这里发生了什么吗？

Answer 1

简而言之，:基本上表示“选择此轴中的所有内容”，而传递索引列表则表示“从此轴中选择给定的索引”。

当你只有一个索引列表时，这两个索引可以是等价的。使用小型2D矩阵更容易看到：

>>> X = np.random.randint(0, 10, size=(3, 3))
>>> X
array([[2, 4, 8],
       [0, 6, 9],
       [4, 2, 5]])
>>> X[:, :]
array([[2, 4, 8],
       [0, 6, 9],
       [4, 2, 5]])
>>> X[:, [0, 1, 2]]
array([[2, 4, 8],
       [0, 6, 9],
       [4, 2, 5]])

这样才有意义。现在，当你使用两个索引列表时，numpy的语义表明这些索引是成对匹配的（或者更一般地说，它们是一起广播的）。请考虑以下事项：

>>> X[[0, 1, 2], [0, 1, 2]]
array([2, 6, 5])

它返回(0, 0)元素，(1, 1)元素和(2, 2)元素。这种索引（您传递索引列表的地方）被称为花式索引，并且可以非常强大。您可以阅读有关花式索引的更多信息，并查看一些示例here（完全披露：此链接指向我自己的网站）。

什么时候：相当于Numpy中完整的指数向量？

1 个答案: