Question

检查这段代码片段，

import numpy as np    
a = np.arange(20).reshape(2,10)

# the result is right if there is only 1 key
func = lambda x: dict(k1=len(x))
print np.apply_along_axis(func, -1, a)
out[1]: [[{'k1': 10}]
         [{'k1': 10}]]

# but if there are more than 1 key in the returned dict
# the results are duplicated
func = lambda x: dict(k1=1, k2=len(x))
print np.apply_along_axis(func, -1, a)
out[2]: [[{'k2': 10, 'k1': 1} {'k2': 10, 'k1': 1}]
          [{'k2': 10, 'k1': 1} {'k2': 10, 'k1': 1}]]

func = lambda x: dict(k1=1, k2=2, k3=len(x))
print np.apply_along_axis(func, -1, a)
out[3]: [[{'k3': 10, 'k2': 2, 'k1': 1} {'k3': 10, 'k2': 2, 'k1': 1} {'k3': 10, 'k2': 2, 'k1': 1}]
         [{'k3': 10, 'k2': 2, 'k1': 1} {'k3': 10, 'k2': 2, 'k1': 1} {'k3': 10, 'k2': 2, 'k1': 1}]]

问题已在评论中描述，结果也已显示。

Answer 1

似乎np.apply_along_axis正试图根据调用func的结果来确定结果形状应该是什么。如果您的输入数组的形状为(n, m)而func返回长度为k的内容，则np.apply_along_axis(func, -1, a)将返回形状(n, k)的数组。即使您的函数返回除列表或数组之外的其他内容，也是如此。如果函数返回标量，则生成的形状将为(n,)。

示例：

# np.diff(a[0]) has length 9.
>>> np.apply_along_axis(lambda x: np.diff(x), -1, a).shape
(2, 9)
# sorted(a[0]) has length 10
>>> np.apply_along_axis(lambda x: sorted(x), -1, a).shape
(2, 10)
# len(a[0]) is a scalar
>>> np.apply_along_axis(lambda x: len(x), -1, a).shape
(2,)

现在，在您的情况下，由于您返回长度为2的dict，因此生成的形状为(2, 2)。一个简单的解决方法是将字典包装在标量中。但显然，numpy不喜欢自定义标量。因此，如果您尝试使用这样的自定义DictWrap类：

class DictWrap(object):
    def __init__(self, *args, **kwargs):
        self._d = dict(*args, **kwargs)

......它不起作用：

>>> np.apply_along_axis(lambda x: DictWrap(k1=1, k2=len(x)), -1, a)
...
TypeError: object of type 'DictWrap' has no len()

因此我们需要向__len__()添加自定义DictWrap方法，返回1，或者我们可以将字典包装在列表中：

>>> np.apply_along_axis(lambda x: [dict(k1=1, k2=len(x))], -1, a)
array([[{'k2': 10, 'k1': 1}],
       [{'k2': 10, 'k1': 1}]], dtype=object)

形状为(2, 1)。您可以在其上调用squeeze()以获得一维数组：

>>> r = np.apply_along_axis(lambda x: [dict(k1=1, k2=len(x))], -1, a)
>>> r.squeeze()
array([{'k2': 10, 'k1': 1}, {'k2': 10, 'k1': 1}], dtype=object)

另一种，也许是最简单的方法是自己摆脱额外的尺寸：

>>> r = np.apply_along_axis(lambda x: dict(k1=1, k2=len(x)), -1, a)
>>> r[:, 0]
array([{'k2': 10, 'k1': 1}, {'k2': 10, 'k1': 1}], dtype=object)

要了解numpy如何处理各种案例，请参阅documentation of apply_along_axis（特别是从if isscalar(res):开始）。

返回dict时，np.apply_along_axis的重复输出

1 个答案: