Question

当乘以numpy数组时，我遇到了以下问题。在下面的示例中（与我正在处理的实际版本略有简化），我从一个几乎为空的数组A和一个完整的数组C开始。然后我使用递归算法填写A。

下面，我以两种不同的方式执行此算法。第一种方法涉及操作

n_array = np.arange(0,c-1)
temp_vec= C[c-n_array] * A[n_array]
A[c] += temp_vec.sum(axis=0)

而第二种方法涉及for循环

for m in range(0, c - 1):
    B[c] +=  C[c-m] * B[m]

请注意，数组A和B是相同的，但是使用两种不同的方法填充它们。

在下面的示例中，我计算使用每种方法执行计算所需的时间。我发现，例如，使用n_pix=2和max_counts = 400，第一种方法比第二种方法快得多（即time_np远小于time_for）。但是，当我切换到n_pix=1000和max_counts = 400时，我发现方法2要快得多（time_for远小于time_np）。我认为方法1总是会更快，因为方法2明确地在循环中运行，而方法1使用np.multiply。

所以，我有两个问题：

为什么定时表现为n_pix对固定max_counts的函数？
编写此代码的最佳方法是什么，以便它对所有n_pix的行为都很快？

也就是说，有人可以提出方法3吗？在我的项目中，这段代码在一大堆n_pix范围内快速执行非常重要。

import numpy as np
import time

def return_timing(n_pix,max_counts):
    A=np.zeros((max_counts+1,n_pix))
    A[0]=np.random.random(n_pix)*1.8
    A[1]=np.random.random(n_pix)*2.3

    B=np.zeros((max_counts+1,n_pix))
    B[0]=A[0]
    B[1]=A[1]

    C=np.outer(np.random.random(max_counts+1),np.random.random(n_pix))*3.24

    time_np=0
    time_for=0
    for c in range(2, max_counts + 1):

        t0 = time.time()
        n_array = np.arange(0,c-1)
        temp_vec= C[c-n_array] * A[n_array]
        A[c] += temp_vec.sum(axis=0) 
        time_np += time.time()-t0

        t0 = time.time()
        for m in range(0, c - 1):
            B[c] +=  C[c-m] * B[m]  
        time_for += time.time()-t0
    return time_np, time_for

Answer 1

首先，您可以轻松替换：

n_array = np.arange(0,c-1)
temp_vec= C[c-n_array] * A[n_array]
A[c] += temp_vec.sum(axis=0)

使用：

A[c] += (C[c:1:-1] * A[:c-1]).sum(0)

这要快得多，因为使用数组进行索引要比切片慢得多。但是temp_vec仍然隐藏在那里，在完成求和之前创建。这导致了使用einsum的想法，这是最快的，因为它不会生成临时数组。

A[c] = np.einsum('ij,ij->j', C[c:1:-1], A[:c-1])

时序。对于小型阵列：

>>> return_timing(10,10)
numpy OP    0.000525951385498
loop OP     0.000250101089478
numpy slice 0.000246047973633
einsum      0.000170946121216

对于大型：

>>> return_timing(1000,100)
numpy OP    0.185983896255
loop OP     0.0458009243011
numpy slice 0.038364648819
einsum      0.0167834758759

Answer 2

可能是因为你的numpy-only版本需要创建/分配新的ndarrays（<html> <link rel="stylesheet" type="text/css" href="main.css"> <head> <title>Paging Using PHP</title> </head> <body> <?php $dbhost = 'localhost:3036'; $dbuser = 'useer'; $dbpass = 'passwoord'; $rec_limit = 10; $conn = mysql_connect($dbhost, $dbuser, $dbpass); if(! $conn ) { die('Could not connect: ' . mysql_error()); } mysql_select_db('disks'); /* Get total number of records */ $sql = "SELECT count(id) FROM hdd "; $retval = mysql_query( $sql, $conn ); if(! $retval ) { die('Could not get data: ' . mysql_error()); } $row = mysql_fetch_array($retval, MYSQL_NUM ); $rec_count = $row[0]; if( isset($_GET{'page'} ) ) { $page = $_GET{'page'} + 1; $offset = $rec_limit * $page ; } else { $page = 0; $offset = 0; } echo '<h3>',Table,'</h3>'; $left_rec = $rec_count - ($page * $rec_limit); $sql = "SELECT cust, manu, model, serial, capacity, firmware, deviceid, ataver, ltime, date, ourref, result FROM hdd"; "FROM hdd". "LIMIT $offset, $rec_limit"; $retval = mysql_query( $sql, $conn ); if(! $retval ) { die('Could not get data: ' . mysql_error()); } echo '<table cellpadding="0" cellspacing="0" class="db-table">'; echo '<tr><th>Customer</th> <th>HDD Type</th> <th>Model</th> <th>Serial</th> <th>Size</th> <th>Firmware</th> <th>Device ID</th> <th>ATA Ver</th> <th>Manufactured On</th> <th>date</th> <th>ourref</th> <th>result</th></tr>'; while($row = mysql_fetch_array($retval, MYSQL_ASSOC)) { echo '<tr>'; $i=0; foreach($row as $key=>$value) { if($i==10) break; echo '<td>',$value,'</td>'; } echo '</tr>'; } echo '</table><br />'; if( $page > 0 ) { $last = $page - 2; echo "<a href=\"$_PHP_SELF?page=$last\">Last 10 Records</a> |"; echo "<a href=\"$_PHP_SELF?page=$page\">Next 10 Records</a>"; } else if( $page == 0 ) { echo "<a href=\"$_PHP_SELF?page=$page\">Next 10 Records</a>"; } else if( $left_rec < $rec_limit ) { $last = $page - 2; echo "<a href=\"$_PHP_SELF?page=$last\">Last 10 Records</a>"; } mysql_close($conn); ?>和temp_vec），而你的其他方法则不需要。

创建新的ndarray非常慢，如果您可以修改代码，以便不再需要不断创建代码，我希望您可以通过该方法获得更好的性能。

numpy数组乘法比使用向量乘法的循环慢？

2 个答案: