为什么生成器函数不使用空闲时间来准备下一个收益?

时间:2017-04-06 04:51:25

标签: python multithreading iterator generator multicore

在今天的多核,多线程CPU(我的笔记本电脑中有两个内核,每个内核有两个线程)的编程世界中,编写能够利用所提供硬件功能的代码变得越来越有意义。像go(lang)这样的语言的诞生是为了让程序员更容易通过产生多个独立的程序来加速应用程序。稍后再次同步它们的进程。

在这个上下文中与Python中的生成器函数取得联系我预计这些函数将使用在后续项目请求之间传递的空闲时间来准备下一个立即交付的收益,但似乎不是那样 - 至少所以我对运行以下提供的代码得到的结果的解释。

令我更加困惑的是,生成器函数的调用者必须等到函数完成处理所有剩余指令,即使生成器已经交付了所有项目。

  

我目前可以看到有什么明确的理由,为什么是发电机   功能在产量请求之间的空闲时间内没有运行代码   超过要求的产量,直到它满足下一个产量指令和   甚至让呼叫者等待所有物品都已送达?

这里是我使用的代码:

import time
startTime = time.time()
time.sleep(1)
def generatorFunctionF():
    print("# here: generatorFunctionF() lineNo #1", time.time()-startTime)
    for i in range(1,4):
        print("# now: time.sleep(1)", time.time()-startTime)
        time.sleep(1)
        print("# before yield", i, time.time()-startTime)
        yield i # yield i
        print("# after  yield", i, time.time()-startTime)
    print("# now: time.sleep(5)", time.time()-startTime)
    time.sleep(5)
    print("# end followed by 'return'", time.time()-startTime)
    return
#:def

def standardFunctionF():
    print("*** before: 'gFF = generatorFunctionF()'", time.time()-startTime) 
    gFF = generatorFunctionF()
    print("*** after:  'gFF = generatorFunctionF()'", time.time()-startTime) 
    print("*** before print(next(gFF)", time.time()-startTime)
    print(next(gFF))
    print("*** after  print(next(gFF)", time.time()-startTime)
    print("*** before time.sleep(3)", time.time()-startTime)
    time.sleep(3)
    print("*** after  time.sleep(3)", time.time()-startTime)
    print("*** before print(next(gFF)", time.time()-startTime)
    print(next(gFF))
    print("*** after  print(next(gFF)", time.time()-startTime)
    print("*** before list(gFF)", time.time()-startTime)
    print("*** list(gFF): ", list(gFF), time.time()-startTime)
    print("*** after:  list(gFF)", time.time()-startTime)
    print("*** before time.sleep(3)", time.time()-startTime)
    time.sleep(3)
    print("*** after  time.sleep(3)", time.time()-startTime)
    return "*** endOf standardFunctionF"

print()
print(standardFunctionF)
print(standardFunctionF())

给出:

>python3.6 -u "aboutIteratorsAndGenerators.py"

<function standardFunctionF at 0x7f97800361e0>
*** before: 'gFF = generatorFunctionF()' 1.001169204711914
*** after:  'gFF = generatorFunctionF()' 1.0011975765228271
*** before print(next(gFF) 1.0012099742889404
# here: generatorFunctionF() lineNo #1 1.0012233257293701
# now: time.sleep(1) 1.0012412071228027
# before yield 1 2.0023491382598877
1
*** after  print(next(gFF) 2.002397298812866
*** before time.sleep(3) 2.0024073123931885
*** after  time.sleep(3) 5.005511283874512
*** before print(next(gFF) 5.005547761917114
# after  yield 1 5.005556106567383
# now: time.sleep(1) 5.005565881729126
# before yield 2 6.006666898727417
2
*** after  print(next(gFF) 6.006711006164551
*** before list(gFF) 6.0067174434661865
# after  yield 2 6.006726026535034
# now: time.sleep(1) 6.006732702255249
# before yield 3 7.0077736377716064
# after  yield 3 7.0078125
# now: time.sleep(5) 7.007838010787964
# end followed by 'return' 12.011908054351807
*** list(gFF):  [3] 12.011950254440308
*** after:  list(gFF) 12.011966466903687
*** before time.sleep(3) 12.011971473693848
*** after  time.sleep(3) 15.015069007873535
*** endOf standardFunctionF
>Exit code: 0

3 个答案:

答案 0 :(得分:2)

生成器被设计为更简单,更简单,更易于理解的编写迭代器的语法。那是他们的用例。想要使迭代器更短且更容易理解的人想要将线程同步的麻烦引入他们编写的每个迭代器中。这与设计目标相反。

因此,生成器基于coroutines和协作式多任务处理的概念,而不是线程。设计权衡是不同的;生成器牺牲并行执行来换取更容易推理的语义。

此外,为每个生成器使用单独的线程将是非常低效的,并且确定何时并行化是一个难题。例如,Go实现仍默认为GOMAXPROCS = 1。大多数生成器实际上值得在另一个线程中执行。哎呀,他们不值得在另一个线程中执行,即使是在Gil-less的Python实现中,比如Jython或Grumpy。

如果你想要并行运行的东西,已经通过启动一个线程或进程并通过队列与它进行通信来处理。

答案 1 :(得分:1)

因为收益率之间的代码可能有副作用。不仅在“想要下一个值”时,而且在想要通过继续运行代码来推进生成器时,您可以使发生器前进。

答案 2 :(得分:-2)

关于Python中生成器函数的预期特性的问题应该从更广泛的主题的角度看待

  

隐式并行

这里是excerpt from Wikipedia&#34;在计算机科学中,隐式并行是编程语言的一个特征,它允许编译器或解释器自动利用某些语言所表达的计算所固有的并行性。 ; s构造。&#34;

问题的本质是否有任何重要原因,为什么生成器函数在收益之间的空闲时间内没有预取下一项?实际上是要求

  

&#34; Python作为编程语言是否支持隐式并行性?&#34;

尽管(问题作者的引言表达了意见):&#34; 没有任何理由说明为什么发电机功能不应该提供这种&#39;智能&#39;行为。&#34;,在Python作为编程语言的上下文中,问题的实际正确答案(已在评论中给出但未明确揭示问题的核心)是:

Python生成器函数不应该在后台智能地预取下一个项目以便以后立即传递的重要原因是Python作为编程语言 不支持隐式并行。

这就是说,在这种情况下探索是否有可能在Python中以明确的方式提供预期的特征肯定是有趣的?是的,这是可能的。让我们在这个上下文中演示一个生成器函数,它能够通过将此特性显式编程到这样的函数中来隐式预取后台中的下一个项目:

from multiprocessing import Process
import time

def generatorFetchingItemsOnDemand():
    for i in range(1, 4):
        time.sleep(2)
        print("# ...ItemsOnDemand spends 2 seconds for delivery of item")
        yield i

def generatorPrefetchingItemsForImmediateDelivery():
    with open('tmpFile','w') as tmpFile:
        tmpFile.write('')
        tmpFile.flush()

    def itemPrefetcher():
        for i in range(1, 4):
            time.sleep(2)
            print("### itemPrefetcher spends 2 seconds for prefetching an item")
            with open('tmpFile','a') as tmpFile:
                tmpFile.write(str(i)+'\n')
                tmpFile.flush()

    p = Process(target=itemPrefetcher)
    p.start()

    for i in range(1, 4):
        with open('tmpFile','r') as tmpFile:
            lstFileLines = tmpFile.readlines()
        if len(lstFileLines) < i: 
            while len(lstFileLines) < i:
                time.sleep(0.1)
                with open('tmpFile','r') as tmpFile:
                    lstFileLines = tmpFile.readlines()

        yield int(lstFileLines[i-1])
#:def

def workOnAllItems(intValue):
    startTime = time.time()
    time.sleep(2)
    print("workOn(", intValue, "): took", (time.time()-startTime), "seconds")
    return intValue

print("===============================")        
genPrefetch = generatorPrefetchingItemsForImmediateDelivery()
startTime = time.time()
for item in genPrefetch:
    workOnAllItems(item)
print("using genPrefetch workOnAllItems took", (time.time()-startTime), "seconds")
print("-------------------------------")        
print()
print("===============================")        
genOnDemand = generatorFetchingItemsOnDemand()
startTime = time.time()
for item in genOnDemand:
    workOnAllItems(item)
print("using genOnDemand workOnAllItems took", (time.time()-startTime), "seconds")
print("-------------------------------")        

提供的代码使用文件系统进行进程间通信,因此如果您希望在自己的编程中重用此概念,请使用现有的其他更快的进程间通信机制来替换它。以这里演示的方式实现生成器函数,做问题的作者期望生成器函数应该做什么并且有助于加速应用程序(这里从12到8秒):

>python3.6 -u "generatorPrefetchingItemsForImmediateDelivery.py"
===============================
### itemPrefetcher spends 2 seconds for prefetching an item
### itemPrefetcher spends 2 seconds for prefetching an item
workOn( 1 ): took 2.0009119510650635 seconds
### itemPrefetcher spends 2 seconds for prefetching an item
workOn( 2 ): took 2.0010197162628174 seconds
workOn( 3 ): took 2.00161075592041 seconds
using genPrefetch workOnAllItems took 8.013896942138672 seconds
-------------------------------

===============================
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 1 ): took 2.0011563301086426 seconds
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 2 ): took 2.001920461654663 seconds
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 3 ): took 2.0002224445343018 seconds
using genOnDemand workOnAllItems took 12.007976293563843 seconds
-------------------------------
>Exit code: 0