Question

我有一个循环来求和：

for t in reversed(range(len(inputs))):
  dy = np.copy(ps[t])
  dy[targets[t]] -= 1 
  dWhy += np.dot(dy, hs[t].T)
  dby += dy

输入值太大，我必须使其平行。因此，我将循环转换为一个单独的函数。我尝试使用ThreadPoolExecutor，但是与顺序算法相比，结果时间很慢。

这是我最小的工作示例：

import numpy as np
import concurrent.futures
import time, random 

from concurrent.futures import ThreadPoolExecutor
import threading

#parameters
dWhy = np.random.sample(300)
dby = np.random.sample(300)

def Func(ps, targets, hs,  t):
  global dWhy, dby
  dy = np.copy(ps[t])
  dWhy += np.dot(dy, hs[t].T)
  dby += dy

  return dWhy, dby

if __name__ == '__main__':    

    ps = np.random.sample(100000)
    targets = np.random.sample(100000)
    hs = np.random.sample(100000)

    start = time.time()

    for t in range(100000):
        dy = np.copy(ps[t])
        dWhy += np.dot(dy, hs[t].T)
        dby += dy

    finish = time.time()
    print("One thread: ")
    print(finish-start)

    dWhy = np.random.sample(300)
    dby = np.random.sample(300)
    start = time.time()

    with concurrent.futures.ThreadPoolExecutor() as executor:
        args = ((ps, targets, hs,  t) for t in range(100000))
        for out1, out2  in executor.map(lambda p: Func(*p), args):
            dWhy, dby = out1, out2

    finish = time.time()
    print("Multithreads time: ")
    print(finish-start)

在我的PC上，一个线程时间约为3秒，多线程时间约为1分钟。

Answer 1

将lambda转换为命名函数。

Answer 2

考虑将其隐含为广播：

timer = time.time()
for i in range(20000):
    dWhy += np.dot(ps,hs)
    dby += np.sum(ps)
print(time.time()-timer)
>>3.2034592628479004
print(time.time()-timer)/20000)
>>0.00016017296314239503

运行速度快20000倍

[OutputCache(Duration = 3600, VaryByCustom  = "none")]
public ActionResult CheckIsCustomPage(string sefLink)
{
   //for example i need here like this
   //if (sefLink=="blablabla.html"){
   // turnoff output cache
   // }
}

Python中的并行CPU总和

2 个答案: