Question

我想在一个字符串数组上映射一个函数f。我构造了f的向量化版本并将其应用于我的数组。但是数组的第一个元素被传递了两次：

import numpy as np

def f(string):
    print('called with', string)

a = np.array(['110', '012'])

fv = np.vectorize(f)
np.apply_along_axis(fv, axis=0, arr=a)


called with 110
called with 110
called with 012

那是为什么？我不会期望110会两次传递给f，而且我也不知道为什么会这样。

我对np.vectorize或np.apply_along_axis的误解是什么？

Answer 1

来自docs：

矢量化输出的数据类型是通过使用输入的第一个元素调用该函数来确定的。可以通过指定 otypes 参数来避免这种情况。

进行额外的调用以确定输出dtype。

Answer 2

In [145]: def f(string):
     ...:     print('called with', string)
     ...: 
     ...: a = np.array(['110', '012'])
     ...: 
     ...: fv = np.vectorize(f)
     ...: 
In [146]: fv(a)
called with 110
called with 110
called with 012
Out[146]: array([None, None], dtype=object)

仅打印的函数将返回None。 vectorized调用它一次以确定返回dtype-在这种情况下，它推导了object。

如果我们指定otypes之类的int，则会收到错误消息：

In [147]: fv = np.vectorize(f, otypes=[int])
In [148]: fv(a)
called with 110
called with 012
---------------------------------------------------------------------------
...    
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

otypes与返回的对象不兼容

In [149]: fv = np.vectorize(f, otypes=[object])
In [150]: fv(a)
called with 110
called with 012
Out[150]: array([None, None], dtype=object)

更好，更有意义的功能：

In [151]: def f(string):
     ...:     print('called with', string)
     ...:     return len(string)
     ...: 
     ...: 
In [152]: fv = np.vectorize(f, otypes=[int])
In [153]: fv(a)
called with 110
called with 012
Out[153]: array([3, 3])

请记住，vectorize将标量值传递给函数。实际上，它会评估输入数组的每个元素，并返回形状匹配的数组：

In [154]: fv(np.array([a,a,a]))
called with 110
called with 012
called with 110
called with 012
called with 110
called with 012
Out[154]: 
array([[3, 3],
       [3, 3],
       [3, 3]])

与普通迭代相比，例如np.array([f(i) for i in a])，速度较慢，但如果输入数组可以具有多个维度，则更加方便，如果需要相互广播多个数组，则更好。

对于像a这样的简单数组，np.vectorize过于杀伤。

vectorize有另一个参数cache，可以避免这种双重调用，同时仍然允许自动dtype检测：

In [156]: fv = np.vectorize(f, cache=True)
In [157]: fv(a)
called with 110
called with 012
Out[157]: array([3, 3])

自动dtype检测有时会导致错误。例如，如果试验计算返回不同的dtype：

In [160]: def foo(var):
     ...:     if var<0:
     ...:         return -var
     ...:     elif var>0:
     ...:         return var
     ...:     else:
     ...:         return 0  

In [161]: np.vectorize(foo)([0,1.2, -1.2])
Out[161]: array([0, 1, 1])           # int dtype
In [162]: np.vectorize(foo)([0.1,1.2, -1.2])
Out[162]: array([0.1, 1.2, 1.2])     # float dtype

apply_along_axis接受一个1d数组的函数。它遍历所有其他维度，将一组一维切片传递给函数。对于像a这样的一维数组，这没有任何用处。而且即使您的a是nd，也不会有太大帮助。您的fv不需要一维输入。

它也进行试验计算以确定返回数组的形状和dtype。它会自动缓存该结果。

像vectorize一样，apply_along_axis是便捷工具，而不是性能工具。

比较

np.apply_along_axis(fv, axis=0, arr=[a,a,a])
np.apply_along_axis(fv, axis=1, arr=[a,a,a])

了解apply_along如何影响评估顺序。

或通过以下方式对整个row（或列）进行操作：

np.apply_along_axis(lambda x: fv(x).mean(), axis=0, arr=[a,a,a])

np.vectorize和np.apply_along_axis两次将相同的参数传递给映射函数

2 个答案: