Question

我有一个函数，我想应用于一个元组数组，我想知道是否有一个干净的方法来做到这一点。

通常，我可以使用np.vectorize将函数应用于数组中的每个项目，但是，在这种情况下，“每个项目”是一个元组，因此numpy将数组解释为3d数组并将该函数应用于每个项目在元组内。

所以我可以假设传入的数组是以下之一：

元组
元组的一维数组
二维元组数组

我可能会编写一些循环逻辑，但似乎numpy最有可能有更高效率的东西，而且我不想重新发明轮子。

这是一个例子。我试图将tuple_converter函数应用于数组中的每个元组。

array_of_tuples1 = np.array([
        [(1,2,3),(2,3,4),(5,6,7)],
        [(7,2,3),(2,6,4),(5,6,6)],
        [(8,2,3),(2,5,4),(7,6,7)],
    ])

array_of_tuples2 = np.array([
        (1,2,3),(2,3,4),(5,6,7),
    ])

plain_tuple = (1,2,3)



# Convert each set of tuples
def tuple_converter(tup):
    return tup[0]**2 + tup[1] + tup[2]

# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)

print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))

array_of_tuples1的所需输出：

[[ 6 11 38]
 [54 14 37]
 [69 13 62]]

array_of_tuples2的所需输出：

[ 6 11 38]

plain_tuple的所需输出：

但是上面的代码产生了这个错误（因为它试图将函数应用于整数而不是元组。）

<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
     10 
     11 def tuple_converter(tup):
---> 12     return tup[0]**2 + tup[1] + tup[2]
     13 
     14 

IndexError: invalid index to scalar variable.

Answer 1

array_of_tuples1 和 array_of_tuples2 实际上不是元组的数组，只是整数的3维和2维数组：

In [1]: array_of_tuples1 = np.array([
   ...:         [(1,2,3),(2,3,4),(5,6,7)],
   ...:         [(7,2,3),(2,6,4),(5,6,6)],
   ...:         [(8,2,3),(2,5,4),(7,6,7)],
   ...:     ])

In [2]: array_of_tuples1
Out[2]: 
array([[[1, 2, 3],
        [2, 3, 4],
        [5, 6, 7]],

       [[7, 2, 3],
        [2, 6, 4],
        [5, 6, 6]],

       [[8, 2, 3],
        [2, 5, 4],
        [7, 6, 7]]])

所以，不是向量化你的函数，因为它基本上会循环遍历数组的元素（整数），你应该apply it on the suitable axis（“元组”的轴）而不关心序列的类型：

In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]: 
array([[ 6, 11, 38],
       [54, 14, 37],
       [69, 13, 62]])

In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])

Answer 2

上面的其他答案肯定是正确的，可能正是你要找的。但是我注意到你在你的问题中加上了“干净”这个词，所以我也想添加这个答案。

如果我们可以假设所有元组都是Text元素元组（或者它们有一些常量的元素），那么你可以做一个很好的小技巧，这样同一段代码就会处理任何单个元组，1d元组数组或2d元组数组，而没有if / else用于1d / 2d情况。我认为避免切换总是更清晰（尽管我认为这可能会有争议）。

mat <- matrix(1:18,6)
vec <- c(2, 5, 6)

# New matrix 'new_mat' with all zeros, 
# No. of rows = original matrix rows + number new rows to be added
new_mat <- matrix(0,nrow=9,ncol=3)  

# 'new_mat' rows getting filled with `mat` values
new_mat[-vec,] <- mat   
new_mat
#      [,1] [,2] [,3]
# [1,]    1    7   13
# [2,]    0    0    0
# [3,]    2    8   14
# [4,]    3    9   15
# [5,]    0    0    0
# [6,]    0    0    0
# [7,]    4   10   16
# [8,]    5   11   17
# [9,]    6   12   18

根据需要为您的输入输出以下内容：

Answer 3

如果您认真对待tuples位，则可以定义结构化dtype。

In [535]: dt=np.dtype('int,int,int')

In [536]: x1 = np.array([
        [(1,2,3),(2,3,4),(5,6,7)],
        [(7,2,3),(2,6,4),(5,6,6)],
        [(8,2,3),(2,5,4),(7,6,7)],
    ], dtype=dt)

In [537]: x1
Out[537]: 
array([[(1, 2, 3), (2, 3, 4), (5, 6, 7)],
       [(7, 2, 3), (2, 6, 4), (5, 6, 6)],
       [(8, 2, 3), (2, 5, 4), (7, 6, 7)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

请注意，显示使用元组。 x1是类型为dt的3x3数组。元素或记录显示为元组。如果元组元素不同，这更有用 - 浮点数，整数，字符串等。

现在定义一个适用于这种数组字段的函数：

In [538]: def foo(tup):
    return tup['f0']**2 + tup['f1'] + tup['f2']

它适用于x1。

In [539]: foo(x1)
Out[539]: 
array([[ 6, 11, 38],
       [54, 14, 37],
       [69, 13, 62]])

它也适用于相同dtype的1d数组。

In [540]: x2=np.array([(1,2,3),(2,3,4),(5,6,7) ],dtype=dt)

In [541]: foo(x2)
Out[541]: array([ 6, 11, 38])

匹配类型的0d数组：

In [542]: foo(np.array(plain_tuple,dtype=dt))
Out[542]: 6

但是foo(plain_tuple)将不起作用，因为该函数被编写为使用命名字段而不是索引字段。

如果需要，可以修改该函数以将输入转换为正确的dtype：

In [545]: def foo1(tup):
    temp = np.asarray(tup, dtype=dt)
   .....:     return temp['f0']**2 + temp['f1'] + temp['f2']

In [548]: plain_tuple
Out[548]: (1, 2, 3)

In [549]: foo1(plain_tuple) 
Out[549]: 6

In [554]: foo1([(1,2,3),(2,3,4),(5,6,7)])  # list of tuples
Out[554]: array([ 6, 11, 38])

将函数应用于元组数组

3 个答案: