Question

我有一个我想要规范化的向量列表（在Python中），同时删除最初具有小规范的向量。

输入列表是，例如

a = [(1,1),(1,2),(2,2),(3,4)]

我需要输出(x*n, y*n)与n = (x**2+y**2)**-0.5

例如，如果我只需要规范，那么列表理解会很容易：

an = [ (x**2+y**2)**0.5 for x,y in a ]

例如，也可以很容易地存储一个标准化的x，但我想要的是将这个临时变量“n”用于两次计算，然后扔掉它。

我不能只使用lambda函数，因为我还需要n来过滤列表。那么最好的方法是什么？

现在我在这里使用这个嵌套列表理解（在内部列表中有一个表达式）：

a = [(1,1),(1,2),(2,2),(3,4)]

[(x*n,y*n) for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

# Out[14]: 
# [(0.70710678118654757, 0.70710678118654757),
#  (0.60000000000000009, 0.80000000000000004)]

内部列表生成带有额外值（n）的元组，然后我将这些值用于计算和过滤。这真的是最好的方式吗？我应该注意哪些可怕的低效率？

Answer 1

Is this really the best way?

嗯，它确实有效，如果你真的，真的想写oneliners那么它是你能做的最好的。

另一方面，一个简单的4行函数可以做得更清楚：

def normfilter(vecs, min_norm):
    for x,y in vecs:
        n = (x**2.+y**2.)**-0.5
        if min_norm < n:
            yield (x*n,y*n)

normalized = list(normfilter(vectors, 0.4))

顺便说一下，你的代码或描述中有一个错误 - 你说你过滤掉短向量但你的代码反其道而行：p

Answer 2

这表明使用forloop可能是最快的方法。请务必检查您自己机器上的timeit结果，因为这些结果可能因许多因素（硬件，操作系统，Python版本，a的长度等）而异。

a = [(1,1),(1,2),(2,2),(3,4)]

def two_lcs(a):
    an = [ ((x**2+y**2)**0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 0.4 ]
    return an

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**0.5
        if n<0.4:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

产生这些时间结果：

% python -mtimeit -s'import test' 'test.using_forloop(test.a)'
100000 loops, best of 3: 3.29 usec per loop
% python -mtimeit -s'import test' 'test.two_lcs(test.a)'
100000 loops, best of 3: 4.52 usec per loop
% python -mtimeit -s'import test' 'test.using_lc(test.a)'
100000 loops, best of 3: 6.97 usec per loop

Answer 3

从unutbu窃取代码，这是一个更大的测试，包括numpy版本和迭代器版本。请注意，将列表转换为numpy可能需要一些时间。

import numpy

# a = [(1,1),(1,2),(2,2),(3,4)]
a=[]
for k in range(1,10):
    for j in range(1,10):
        a.append( (float(k),float(j)) )

npa = numpy.array(a)

def two_lcs(a):
    an = [ ((x**2+y**2)**-0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 5.0 ]
    return an

def using_iterator(a):
    def normfilter(vecs, min_norm):
        for x,y in vecs:
            n = (x**2.+y**2.)**-0.5
            if n < min_norm:
                yield (x*n,y*n)

    return list(normfilter(a, 5.0))

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**-0.5
        if n<5.0:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 5.0]


def using_numpy(npa):
    n = (npa[:,0]**2+npa[:,1]**2)**-0.5
    where = n<5.0
    npa = npa[where]
    n = n[where]
    npa[:,0]=npa[:,0]*n
    npa[:,1]=npa[:,1]*n
    return( npa )

和结果......

nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.two_lcs(test.a)'
10000 loops, best of 3: 65.8 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_lc(test.a)'
10000 loops, best of 3: 65.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_forloop(test.a)'
10000 loops, best of 3: 64.1 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_iterator(test.a)'
10000 loops, best of 3: 59.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_numpy(test.npa)'
10000 loops, best of 3: 48.7 usec per loop

Answer 4

从Python 3.8开始，并引入assignment expressions (PEP 572)（:=运算符），可以在列表推导中使用局部变量，以避免多次调用同一表达式：

在我们的案例中，如果(x**2.+y**2.)**-.5低于{{1}，我们可以将表达式n的值命名为变量n，同时使用表达式的结果来过滤列表};并因此重新使用0.4来产生映射值：

列表理解中的中间变量，用于同时过滤和转换

4 个答案: