我有一个stata do文件pyexample3.do,它使用其参数作为回归量来运行回归。回归的F统计量保存在文本文件中。代码如下:
clear all
set more off
local y `1'
display `"first parameter: `y'"'
sysuse auto
regress price `y'
local f=e(F)
display "`f'"
file open myhandle using test_result.txt, write append
file write myhandle "`f'" _n
file close myhandle
exit, STATA clear
现在我尝试在python中并行运行stata do文件,并将所有F统计信息写入一个文本文件中。我的cpu有4个核心。
import multiprocessing
import subprocess
def work(staname):
dofile = "pyexample3.do"
cmd = ["StataMP-64.exe","/e", "do", dofile,staname]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
my_list =[ "mpg","rep78","headroom","trunk","weight","length","turn","displacement","gear_ratio" ]
my_list.sort()
print my_list
# Get the number of processors available
num_processes = multiprocessing.cpu_count()
threads = []
len_stas = len(my_list)
print "+++ Number of stations to process: %s" % (len_stas)
# run until all the threads are done, and there is no data left
for list_item in my_list:
# if we aren't using all the processors AND there is still data left to
# compute, then spawn another thread
if( len(threads) < num_processes ):
p = multiprocessing.Process(target=work,args=[list_item])
p.start()
print p, p.is_alive()
threads.append(p)
else:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
尽管do文件应该运行9次,因为my_list中有9个字符串,但它只运行了4次。那么哪里出错了?
答案 0 :(得分:2)
在for list_item in my_list
循环中,在前4个流程启动后,它会进入else
:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
正如您所看到的那样,thread.is_alive()
不会阻止,这个循环会立即执行,而这4个进程中的任何一个都不会完成任务。因此,只有前4个进程总共执行。
您可以简单地使用while
循环以小间隔持续检查流程状态:
keep_checking = True
while keep_checking:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
keep_checking = False
time.sleep(0.5) # wait 0.5s