如何在python中对stdout进行原子写入?

时间:2014-07-27 09:12:30

标签: python multiprocessing output stdout atomic

我在某些消息来源中读到打印命令不是线程安全的,而且解决方法是使用 sys.stdout.write 命令,但是它仍然不适合我,写 STDOUT 并不是原子。

这是一个简短的例子(称为此文件parallelExperiment.py):

   import os
   import sys
   from multiprocessing import Pool

   def output(msg):
    msg = '%s%s' % (msg, os.linesep)
    sys.stdout.write(msg)

   def func(input):
    output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input)))

   def executeFunctionInParallel(funcName, inputsList, maxParallelism):
       output(u'Executing function %s on input of size %d with maximum parallelism of %d' % (
           funcName.__name__, len(inputsList), maxParallelism))
       parallelismPool = Pool(processes=maxParallelism)
       executeBooleanResultsList = parallelismPool.map(funcName, inputsList)
       parallelismPool.close()
       output(u'Function %s executed on input of size %d  with maximum parallelism of %d' % (
           funcName.__name__, len(inputsList), maxParallelism))
       # if all parallel executions executed well - the boolean results list should all be True
       return all(executeBooleanResultsList)

   if __name__ == "__main__":
    inputsList=[str(i) for i in range(20)]
    executeFunctionInParallel(func, inputsList, 4)

查看输出:

我。调用 python parallelExperiment.py 的输出(注意单词" pid"在某些行中搞砸了):

Executing function func on input of size 20 with maximum parallelism of 4
ppid:2240 got input "0"
id:4960 got input "2"
pid:4716 got input "4"
pid:4324 got input "6"
ppid:2240 got input "1"
id:4960 got input "3"
pid:4716 got input "5"
pid:4324 got input "7"
ppid:4960 got input "8"
id:2240 got input "10"
pid:4716 got input "12"
pid:4324 got input "14"
ppid:4960 got input "9"
id:2240 got input "11"
pid:4716 got input "13"
pid:4324 got input "15"
ppid:4960 got input "16"
id:2240 got input "18"
ppid:2240 got input "19"
id:4960 got input "17"
Function func executed on input of size 20  with maximum parallelism of 4

II。调用 python parallelExperiment.py>的输出parallelExperiment.log ,意思是将 stdout 重定向到 parallelExperiment.log 文件(注意行的顺序不好,因为之前和在调用并行调用 func executeFunctionInParallel 后,应打印一条消息):

pid:3244 got input "4"
pid:3244 got input "5"
pid:3244 got input "12"
pid:3244 got input "13"
pid:240 got input "0"
pid:240 got input "1"
pid:240 got input "8"
pid:240 got input "9"
pid:240 got input "16"
pid:240 got input "17"
pid:1268 got input "2"
pid:1268 got input "3"
pid:1268 got input "10"
pid:1268 got input "11"
pid:1268 got input "18"
pid:1268 got input "19"
pid:3332 got input "6"
pid:3332 got input "7"
pid:3332 got input "14"
pid:3332 got input "15"
Executing function func on input of size 20 with maximum parallelism of 4
Function func executed on input of size 20  with maximum parallelism of 4

2 个答案:

答案 0 :(得分:7)

因为multiprocessing.Pool实际上使用子进程而不是线程而发生这种情况。 您需要在进程之间使用显式synchronization。请注意,链接上的示例,它解决了您的问题。

import os
import sys
from multiprocessing import Pool, Lock

lock = Lock()

def output(msg):
    msg = '%s%s' % (msg, os.linesep)
    with lock:
        sys.stdout.write(msg)

def func(input):
    output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input)))

def executeFunctionInParallel(funcName, inputsList, maxParallelism):
    output(u'Executing function %s on input of size %d with maximum parallelism of %d' % (
      funcName.__name__, len(inputsList), maxParallelism))
    parallelismPool = Pool(processes=maxParallelism)
    executeBooleanResultsList = parallelismPool.map(funcName, inputsList)
    parallelismPool.close()
    parallelismPool.join()
    output(u'Function %s executed on input of size %d  with maximum parallelism of %d' % (
       funcName.__name__, len(inputsList), maxParallelism))
    # if all parallel executions executed well - the boolean results list should all be True
    return all(executeBooleanResultsList)

if __name__ == "__main__":
    inputsList=[str(i) for i in range(20)]
    executeFunctionInParallel(func, inputsList, 4)

答案 1 :(得分:2)

如果您想避免锁定并且很乐意使用较低级别的界面,您可以使用os.openos.write获得POSIX O_APPEND行为(如果您的系统支持);并参见Is file append atomic in UNIX?