所以我希望使用python多处理模块创建一个进程,我希望它是一个更大的脚本的一部分。 (我也想要很多其他的东西,但现在我会满足于此)
我从multiprocessing docs复制了最基本的代码并稍加修改
然而,每次调用p.join()时,if __name__ == '__main__':
语句之外的一切都会重复。
这是我的代码:
from multiprocessing import Process
data = 'The Data'
print(data)
# worker function definition
def f(p_num):
print('Doing Process: {}'.format(p_num))
print('start of name == main ')
if __name__ == '__main__':
print('Creating process')
p = Process(target=f, args=(data,))
print('Process made')
p.start()
print('process started')
p.join()
print('process joined')
print('script finished')
这是我预期:
The Data
start of name == main
Creating process
Process made
process started
Doing Process: The Data
process joined
script finished
Process finished with exit code 0
这是现实:
The Data
start of name == main
Creating process
Process made
process started
The Data <- wrongly repeated line
start of name == main <- wrongly repeated line
script finished <- wrongly executed early line
Doing Process: The Data
process joined
script finished
Process finished with exit code 0
我不确定这是由if
声明还是p.join()
或其他内容以及为何会发生造成的。有人可以解释 引起了什么,为什么?
为了清楚起见,因为有些人不能复制我的问题,但我有;我使用的是Windows Server 2012 R2 Datacenter,我使用的是python 3.5.3。
答案 0 :(得分:5)
Multiprocessing在Python中的工作方式是每个子进程导入父脚本。在Python中,导入脚本时,会执行函数中未定义的所有内容。据我了解,__name__
在导入脚本(Check this SO answer here for a better understanding)时发生了更改,这与在命令行上直接运行脚本的情况不同,这会导致__name__ == '__main__'
。此导入导致__name__
不等于'__main__'
,这就是为什么if __name__ == '__main__':
中的代码不会为您的子流程执行。
在子进程调用期间你不想执行的任何内容都应该移到代码的if __name__ == '__main__':
部分,因为这只会运行父进程,即你最初运行的脚本。
希望这会有所帮助。如果您环顾四周,Google周围还有更多资源可以更好地解释这一点。我链接了多处理模块的官方Python资源,我建议你仔细研究它。
答案 1 :(得分:0)
探讨该主题时,我遇到了多个模块负载的问题。为了使其能够按上述方式工作,我必须:
下面的示例模块在同一数据集上并行运行多种分类方法:
print("I am being run so often because: https://stackoverflow.com/questions/45591987/multi-processing-code-repeatedly-runs")
def initializer():
from sklearn import datasets
iris = datasets.load_iris()
x = iris.data
y = iris.target
from sklearn.preprocessing import StandardScaler as StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import multiprocessing as mp
from multiprocessing import Manager
results = [] # for some reason it needs to be defined before the if __name__ = __main__
return x, y, StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline, mp, Manager, results
def perceptron(x,y,results, StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline):
scaler = StandardScaler()
estimator = ["Perceptron", Perceptron(n_iter=40, eta0=0.1, random_state=1)]
pipe = Pipeline([('Scaler', scaler),
('Estimator', estimator[1])])
pipe.fit(x,y)
y_pred_pipe = pipe.predict(x)
accuracy = accuracy_score(y, y_pred_pipe)
result = [estimator[0], estimator[1], pipe, y_pred_pipe, accuracy]
results.append(result)
print(estimator[0], "Accuracy: ",accuracy)
return results
def logistic(x,y,results,StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline):
scaler = StandardScaler()
estimator = ["LogisticRegression", LogisticRegression(C=100.0, random_state=1)]
pipe = Pipeline([('Scaler', scaler),
('Estimator', estimator[1])])
pipe.fit(x,y)
y_pred_pipe = pipe.predict(x)
accuracy = accuracy_score(y, y_pred_pipe)
result = [estimator[0], estimator[1], pipe, y_pred_pipe, accuracy]
#results = []
results.append(result)
print(estimator[0], "Accuracy: ",accuracy)
return results
def parallel(x,y,results,StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline):
with Manager() as manager:
tasks = [perceptron, logistic,]
results = manager.list()
procs = []
for task in tasks:
proc = mp.Process(name=task.__name__, target=task, args=(x,y,results,StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline))
procs.append(proc)
print("done with check 1")
proc.start()
print("done with check 2")
for proc in procs:
print("done with check 3")
proc.join()
print("done with check 4")
results = list(results)
print("Within WITH")
print(results)
print("Within def")
print(results)
return results
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)"
x, y, StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline, mp, Manager, results = initializer()
results = parallel(x,y,results,StandardScaler, accuracy_score, Perceptron, LogisticRegression, Pipeline)
print("Outside of def")
print(type(results))
print(len(results))
print(results[1]) # must be within IF as otherwise does not work ?!?!?!?
cpu_count = mp.cpu_count()
print("CPUs: ", cpu_count)