Question

从Udacity's deep learning class开始，y_i的softmax只是指数除以整个Y向量的指数之和：

其中S(y_i)是y_i的softmax函数，e是指数，j是否。输入向量Y中的列。

我尝试过以下方法：

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回：

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是：

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

产生与第一个实现相同的输出，即使第一个实现明确地取每列和最大值的差异然后除以总和。

有人可以用数学方式显示原因吗？一个是正确的而另一个是错的吗？

在代码和时间复杂度方面，实现是否相似？哪个更有效？

Answer 1

他们都是正确的，但从数值稳定性的角度来看，你的首选是正确的。

你从

开始

e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用^（b - c）=（a ^ b）/（a ^ c）的事实我们有

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

另一个答案说的是这个。你可以用任何变量替换max（x），它会被取消。

Answer 2

（嗯......这里有很多混乱，无论是问题还是答案......）

首先，两个解决方案（即你的和推荐的解决方案）不等效;它们发生仅对于1-D得分数组的特殊情况是等效的。如果您在Udacity测验提供的示例中尝试过2-D得分数组，您会发现它。

结果方面，两个解决方案之间唯一的实际差异是axis=0参数。要看到这种情况，让我们尝试您的解决方案（your_softmax）和唯一区别为axis参数的解决方案：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

正如我所说，对于一维得分阵列，结果确实相同：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

尽管如此，以下是Udacity测验中给出的2-D得分数组的结果作为测试示例：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

结果不同 - 第二个确实与Udacity测验中预期的相同，其中所有列确实总和为1，而第一个（错误）结果则不是这样。

所以，所有的大惊小怪实际上是一个实现细节 - axis参数。根据{{3}}：

默认值，axis = None，将汇总输入数组的所有元素

虽然在这里我们想要按行加总，因此axis=0。对于一维数组，（唯一）行和所有元素之和的总和恰好相同，因此在这种情况下你的结果相同......

除axis问题外，您的实施（即您选择减去最大值）实际上比建议的解决方案更好！事实上，它是实现softmax函数的推荐方法 - 请参阅numpy.sum documentation以获得理由（数值稳定性，上面的一些答案也指出了这一点）。

Answer 3

所以，这真是对desertnaut的答案的评论，但由于我的声誉，我无法发表评论。正如他所指出的，如果您的输入包含单个样本，则您的版本才是正确的。如果您的输入包含多个样本，那就错了。 然而，Desertnaut的解决方案也是错误的。问题是，一旦他采取一维输入，然后他采取二维输入。让我告诉你。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

让我们举例说明：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

您可以看到desernauts版本在这种情况下会失败。（如果输入只是像np.array那样的一维（[1,2,3,6]），那就不会了。

现在让我们使用3个样本，因为这就是我们使用2维输入的原因。以下x2与desernauts示例中的x2不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

此输入包含一个包含3个样本的批次。但样本一和三基本相同。我们现在期望3行softmax激活，其中第一行应该与第三行相同，也与我们激活x1相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

我希望你能看到我的解决方案只是这种情况。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

此外，这是TensorFlows softmax实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

结果：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

Answer 4

我想说虽然两者在数学上是正确的，但在实现方面，首先是更好。在计算softmax时，中间值可能变得非常大。划分两个大数字可能在数值上不稳定。 These notes（来自斯坦福大学）提到了一个标准化技巧，它基本上就是你在做什么。

Answer 5

sklearn还提供softmax

的实现

from sklearn.utils.extmath import softmax
import numpy as np

x = np.array([[ 0.50839931,  0.49767588,  0.51260159]])
softmax(x)

# output
array([[ 0.3340521 ,  0.33048906,  0.33545884]])

Answer 6

从数学的角度来看，双方是平等的。

你可以很容易地证明这一点。让我们softmax。现在你的函数m返回一个向量，其第i个坐标等于

请注意，这适用于任何e^m != 0，因为对于所有（甚至复杂的）数字O(n)

从计算复杂性的角度来看，它们也是等效的，并且都在n时间运行，其中e^x是向量的大小。
从numerical stability的角度来看，第一个解决方案是首选，因为x增长得非常快，即使{{1}}的值非常小，它也会溢出。减去最大值可以消除这种溢出。要实际体验我正在讨论的内容，请尝试将x = np.array([1000, 5])添加到您的两个函数中。一个将返回正确的概率，第二个将溢出nan
与问题无关，但您的解决方案仅适用于矢量（Udacity测验也希望您为矩阵计算它）。要解决此问题，您需要使用sum(axis=0)

Answer 7

Here你可以找出他们使用- max的原因。

从那里：

“当你在实践中编写用于计算Softmax函数的代码时，由于指数，中间项可能会非常大。分割大数字可能在数值上不稳定，因此使用归一化技巧很重要。” / p>

Answer 8

修改即可。从版本1.2.0开始，scipy包含softmax作为特殊功能：

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

我写了一个函数在任何轴上应用softmax：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

正如其他用户所描述的那样，减去最大值是一种很好的做法。我写了一篇关于它的详细帖子here。

Answer 9

要提供替代解决方案，请考虑您的参数幅度非常大以致exp(x)下溢（在负面情况下）或溢出（在正面情况下）的情况。在这里，您希望尽可能长时间地保留在日志空间中，仅在您可以信任的结尾处取幂，结果将是良好的。

import scipy.special as sc
import numpy as np

def softmax(x: np.ndarray) -> np.ndarray:
    return np.exp(x - sc.logsumexp(x))

Answer 10

更简洁的版本是：

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

Answer 11

为了保持数值稳定性，应减去max（x）。以下是softmax函数的代码;

def softmax（x）：

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

Answer 12

我建议这样做：

def softmax(z):
    z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
    return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))

它既可以用于随机性，也可以用于批处理。
有关更多详细信息，请参见： https://medium.com/@ravish1729/analysis-of-softmax-function-ad058d6a564d

Answer 13

每个人似乎都发布了他们的解决方案，所以我将发布我的解决方案：

def softmax(x):
    e_x = np.exp(x.T - np.max(x, axis = -1))
    return (e_x / e_x.sum(axis=0)).T

我得到的结果与从sklearn导入的结果完全相同：

from sklearn.utils.extmath import softmax

Answer 14

我需要与Tensorflow的密集层输出兼容的内容。

在这种情况下，@desertnaut中的解决方案不起作用，因为我有大量数据。因此，我提供了另一种在两种情况下均适用的解决方案：

def softmax(x, axis=-1):
    e_x = np.exp(x - np.max(x)) # same code
    return e_x / e_x.sum(axis=axis, keepdims=True)

结果：

logits = np.asarray([
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921], # 1
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921]  # 2
])

print(softmax(logits))

#[[0.2492037  0.24858153 0.25393605 0.24827873]
# [0.2492037  0.24858153 0.25393605 0.24827873]]

参考：Tensorflow softmax

Answer 15

根据所有回复和CS231n notes，请允许我总结一下：

def softmax(x, axis):
    x -= np.max(x, axis=axis, keepdims=True)
    return np.exp(x) / np.exp(x).sum(axis=axis, keepdims=True)

用法：

x = np.array([[1, 0, 2,-1],
              [2, 4, 6, 8], 
              [3, 2, 1, 0]])
softmax(x, axis=1).round(2)

输出：

array([[0.24, 0.09, 0.64, 0.03],
       [0.  , 0.02, 0.12, 0.86],
       [0.64, 0.24, 0.09, 0.03]])

Answer 16

我想补充一点对这个问题的理解。这里减去数组的最大值是正确的。但是如果你在另一篇文章中运行代码，当数组是2D或更高维度时，你会发现它没有给你正确答案。

我在这里给你一些建议：

要获得最大值，请尝试沿x轴进行操作，您将获得一维数组。
将最大阵列重塑为原始形状。
np.exp获得指数值。
沿着轴做np.sum。
获得最终结果。

按照结果，您将通过矢量化获得正确的答案。由于它与大学作业有关，我不能在这里发布确切的代码，但如果你不理解，我想提出更多的建议。

Answer 17

上面的答案已经详细回答了。减去max以避免溢出。我在python3中再添加一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

Answer 18

softmax函数的目的是保持向量的比例，而不是用sigmoid压缩端点，因为值饱和（即倾向于+/- 1（tanh）或从0到1（后勤）））。这是因为它保留了有关端点变化率的更多信息，因此更适用于具有1-N输出编码的神经网络（即如果我们压扁端点，则更难区分1 -of-N输出类，因为我们无法分辨出哪一个是最大的＆＃34;或者＃34;最小的＆＃34;因为它们被压扁了。）;它也使总输出总和为1，而明显的胜利者将接近1，而其他彼此接近的数字将总和为1 / p，其中p是具有相似值的输出神经元的数量。

从向量中减去最大值的目的是，当您执行指数时，您可能会获得非常高的值，将浮动数据剪辑为最大值导致平局，这不是本示例中的情况。如果你减去最大值以产生负数，这就成了一个很大的问题，那么你有一个负指数可以迅速缩小改变比率的值，这就是在海报的问题中发生的并产生了错误的答案。 / p>

Udacity提供的答案非常低效。我们需要做的第一件事是计算所有矢量分量的e ^ y_j，保持这些值，然后将它们相加，然后除。 Udacity搞砸了，他们计算e ^ y_j TWICE !!!这是正确的答案：

def softmax(y):
    e_to_the_y_j = np.exp(y)
    return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)

Answer 19

使用Numpy和Tensorflow可以达到类似的结果。与原始答案唯一的不同是axis API的np.sum参数。

初始方法：axis=0-但是，当尺寸为N时，这不会提供预期的结果。

修改后的方法：axis=len(e_x.shape)-1-始终在最后一个维度上求和。这提供了与tensorflow的softmax函数相似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)

Answer 20

这里是使用numpy和比较来解决张量流ans scipy的正确性的广义解决方案：

数据准备：

import numpy as np

np.random.seed(2019)

batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)

输出：

logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822  0.3930805 ]
  [0.62397    0.6378774 ]
  [0.88049906 0.299172  ]]]

使用tensorflow的Softmax：

import tensorflow as tf

logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)

print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)

with tf.Session() as sess:
    scores_np = sess.run(scores_tf)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用scipy的Softmax：

from scipy.special import softmax

scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.6413727  0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用numpy（https://nolanbconaway.github.io/blog/2017/softmax-numpy）的Softmax：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats.
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter,
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p


scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.49652317 0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Answer 21

import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()

Answer 22

softmax函数是一个激活函数，可以将数字转换为相加的概率。 softmax函数输出一个向量，该向量表示结果列表的概率分布。这也是深度学习分类任务中使用的核心元素。

当我们有多个类时，将使用Softmax函数。

对于找出具有最大最大值的类很有用。概率。

Softmax函数理想地用于输出层，我们实际上是在尝试获得定义每个输入的类的概率。

范围从0到1。

Softmax函数将logits [2.0，1.0，0.1]转换为概率[0.7，0.2，0.1]，并且概率之和为1。Logits是神经网络最后一层输出的原始分数。在激活之前。要了解softmax函数，我们必须查看第（n-1）层的输出。

softmax函数实际上是arg max函数。这意味着它不会从输入中返回最大值，而是返回最大值的位置。

例如：

在softmax之前

X = [13, 31, 5]

softmax之后

array([1.52299795e-08, 9.99999985e-01, 5.10908895e-12]

代码：

import numpy as np

# your solution:

def your_softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum() 

# correct solution: 

def softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum(axis=0) 

# only difference

Answer 23

这也适用于np.reshape。

   def softmax( scores):
        """
        Compute softmax scores given the raw output from the model

        :param scores: raw scores from the model (N, num_classes)
        :return:
            prob: softmax probabilities (N, num_classes)
        """
        prob = None

        exponential = np.exp(
            scores - np.max(scores, axis=1).reshape(-1, 1)
        )  # subract the largest number https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
        prob = exponential / exponential.sum(axis=1).reshape(-1, 1)

        

        return prob

Answer 24

这可以概括并假设您正在对尾随尺寸进行归一化。

Object

Answer 25

我很好奇看到它们之间的性能差异

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

def softmaxv2(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def softmaxv3(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / np.sum(e_x, axis=0)

def softmaxv4(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x)), axis=0)



x=[10,10,18,9,15,3,1,2,1,10,10,10,8,15]

使用

print("----- softmax")
%timeit  a=softmax(x)
print("----- softmaxv2")
%timeit  a=softmaxv2(x)
print("----- softmaxv3")
%timeit  a=softmaxv2(x)
print("----- softmaxv4")
%timeit  a=softmaxv2(x)

增加x内的值（+100 +200 +500 ...），使用原始的numpy版本（这里只是一个测试），我得到的结果总是更好

----- softmax
The slowest run took 8.07 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 17.8 µs per loop
----- softmaxv2
The slowest run took 4.30 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv3
The slowest run took 4.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv4
10000 loops, best of 3: 23 µs per loop

直到...... x内的值达到〜800，然后我得到

----- softmax
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: overflow encountered in exp
  after removing the cwd from sys.path.
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
  after removing the cwd from sys.path.
The slowest run took 18.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv2
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.8 µs per loop
----- softmaxv3
The slowest run took 19.44 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv4
The slowest run took 16.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.7 µs per loop

如某些人所述，您的版本在数字上“对于大量”更稳定。对于较小的数字可能是相反的方式。

如何在Python中实现Softmax函数

25 个答案: