Question

我正在尝试将一个函数应用于numpy数组的所有行，如果行中的列表具有相同的大小，则它会起作用，但只要其中一个具有不同的大小，它就会失败。

要应用的功能

from math import *
import operator



def parseRPN(expression,roundtointeger=False):
    """Parses and calculates the result of a RPN expression
        takes a list in the form of ['2','2','*']
        returns 4
    """""

    def safe_divide(darg1, darg2):
        ERROR_VALUE = 1.
        # ORIGINAL ___ Here we can penalize asymptotes with the var PENALIZE_ASYMPITOTES

        try:
            return darg1 / darg2
        except ZeroDivisionError:
            return ERROR_VALUE

    function_twoargs = {'*': operator.mul, '/': safe_divide, '+': operator.add, '-': operator.sub}
    function_onearg = {'sin': sin, 'cos': cos}
    stack = []
    for val in expression:
        result = None
        if val in function_twoargs:
            arg2 = stack.pop()
            arg1 = stack.pop()
            result = function_twoargs[val](arg1, arg2)
        elif val in function_onearg:
            arg = stack.pop()
            result = function_onearg[val](arg)
        else:
            result = float(val)
        stack.append(result)

    if roundtointeger == True:
        result=stack.pop()
        result=round(result)
    else:
        result=stack.pop()
    return result

不行

dat=np.array([['4','5','*','6','+','3','/'],['4','4','*','6','*'],['4','5','*','6','+'],['4','5','*','6','+']])
lout=np.apply_along_axis(parseRPN,0,dat)

print(dat)
print(lout)

确定

dat=np.array([['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*','6','+'],['4','5','*','6','+']])
lout=np.apply_along_axis(parseRPN,0,dat)

print(dat)
print(lout)

我使用合适的工具吗？这里的想法是将计算向量化为一系列列表。

由于

Answer 1

如果您只使用map或列表理解，则代码可以正常工作。

map(parseRPN, dat)

在您真正需要提高性能之前，我不会担心找出numpy的适用情况。

Answer 2

使用这样的复杂“行”处理，您可以将数组视为列表：

对于长度相等的行，dat是一个二维字符数组：

In [138]: dat=np.array([['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*'
     ...: ,'6','+'],['4','5','*','6','+']])
In [139]: dat
Out[139]: 
array([['4', '5', '*', '6', '+'],
       ['4', '4', '*', '6', '*'],
       ['4', '5', '*', '6', '+'],
       ['4', '5', '*', '6', '+']],
      dtype='<U1')

如果长度不同，则数组为包含列表的1d对象类型：

In [140]: dat1=np.array([['4','5','*','6','+','3','/'],['4','4','*','6','*'],['4
     ...: ','5','*','6','+'],['4','5','*','6','+']])
In [141]: dat1
Out[141]: 
array([list(['4', '5', '*', '6', '+', '3', '/']),
       list(['4', '4', '*', '6', '*']), 
       list(['4', '5', '*', '6', '+']),
       list(['4', '5', '*', '6', '+'])], dtype=object)

在任何一种情况下，简单的行迭代都可以正常工作（map也可以，但在Py3中你必须使用list(map(...))）。

In [142]: [parseRPN(row) for row in dat]
Out[142]: [26.0, 96.0, 26.0, 26.0]
In [143]: [parseRPN(row) for row in dat1]
Out[143]: [8.666666666666666, 96.0, 26.0, 26.0]

apply_along_axis也使用这样的迭代。当数组为3d或更高时，这很好，但是对于1或2d数组的行迭代，它是过度的。

对于像dat1这样的对象数组，frompyfunc可能具有适度的速度优势：

In [144]: np.frompyfunc(parseRPN,1,1)(dat1)
Out[144]: array([8.666666666666666, 96.0, 26.0, 26.0], dtype=object)

np.vectorize速度较慢，但也适用于对象数组

In [145]: np.vectorize(parseRPN)(dat1)
Out[145]: array([  8.66666667,  96.        ,  26.        ,  26.        ])

但是将它应用于2d字符数组需要使用其signature参数，这种参数更慢且更棘手。

numpy无法解决此问题。这实际上是列表问题列表：

In [148]: dat=[['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*','6','+']
     ...: ,['4','5','*','6','+']]
In [149]: [parseRPN(row) for row in dat]
Out[149]: [26.0, 96.0, 26.0, 26.0]

沿轴numpy应用不同大小的数组

2 个答案: