使用多进程和子进程在python中运行并行Stata do文件

时间:2017-02-01 19:23:13

标签: python subprocess multiprocessing stata

我有一个stata do文件pyexample3.do,它使用其参数作为回归量来运行回归。回归的F统计量保存在文本文件中。代码如下:

clear all
set more off        
local y `1'        
display `"first parameter: `y'"'

sysuse auto
regress price `y'
local f=e(F)
display "`f'"
file open myhandle using test_result.txt, write append
file write myhandle "`f'" _n
file close myhandle
exit, STATA clear

现在我尝试在python中并行运行stata do文件,并将所有F统计信息写入一个文本文件中。我的cpu有4个核心。

    import multiprocessing
    import subprocess

    def work(staname):
        dofile = "pyexample3.do"
        cmd = ["StataMP-64.exe","/e", "do", dofile,staname]
        return subprocess.call(cmd, shell=False)

    if __name__ == '__main__':

        my_list =[ "mpg","rep78","headroom","trunk","weight","length","turn","displacement","gear_ratio" ]

        my_list.sort()

        print my_list

        # Get the number of processors available
        num_processes = multiprocessing.cpu_count()

        threads = []

        len_stas = len(my_list)

        print "+++ Number of stations to process: %s" % (len_stas)

        # run until all the threads are done, and there is no data left

        for list_item in my_list:

            # if we aren't using all the processors AND there is still data left to
            # compute, then spawn another thread

            if( len(threads) < num_processes ):

                p = multiprocessing.Process(target=work,args=[list_item])

                p.start()

                print p, p.is_alive()

                threads.append(p)

            else:
                for thread in threads:

                if not thread.is_alive():

                   threads.remove(thread)

尽管do文件应该运行9次,因为my_list中有9个字符串,但它只运行了4次。那么哪里出错了?

1 个答案:

答案 0 :(得分:2)

for list_item in my_list循环中,在前4个流程启动后,它会进入else

for thread in threads:
    if not thread.is_alive():
        threads.remove(thread)

正如您所看到的那样,thread.is_alive()不会阻止,这个循环会立即执行,而这4个进程中的任何一个都不会完成任务。因此,只有前4个进程总共执行。

您可以简单地使用while循环以小间隔持续检查流程状态:

keep_checking = True

while keep_checking:
    for thread in threads:
        if not thread.is_alive():
           threads.remove(thread)
           keep_checking = False

    time.sleep(0.5) # wait 0.5s