我是Python新手。我尝试使用一些多处理来加快我的工作。首先,我尝试了一个例子,一切正常。这是代码:
from multiprocessing import Process
import time
def f(name, n, m):
if name == 'bob':
time.sleep(2)
print 'hello', name, ' ', n, m
def h():
g(1, 2, 3)
def g(a, s, d):
p = Process(target=f, args=('bob', a, s,))
t = Process(target=f, args=('helen', s, d,))
p.start()
t.start()
t.join()
p.join()
print("END")
if __name__ == '__main__':
print("Start")
h()
之后我对我的代码使用了相同的技术并出现了错误。这是有问题的代码的一部分:
if __name__ == "__main__":
night_crawler_steam()
def night_crawler_steam():
.
.
.
multi_processing(max_pages, url, dirname)
.
.
.
def multi_processing(max_pages, url, dirname):
page = 1
while page <= max_pages:
my_url = str(url) + str(page)
soup = my_soup(my_url)
fgt = Process(target=find_game_titles, args=(soup, page, dirname,))
fl = Process(target=find_links, args=(soup, page, dirname,))
fgt.start() #<-----------Here is the problem
fl.start()
fgt.join()
fl.join()
page += 1
def find_links(soup, page, dirname):
.
.
.
def find_game_titles(soup, page, dirname):
.
.
.
当解释器到达fgt.start()时会出现一些错误:
Traceback (most recent call last):
File "C:/Users/��������/Desktop/MY PyWORK/NightCrawler/NightCrawler.py", line 120, in <module>
night_crawler_steam()
File "C:/Users/��������/Desktop/MY PyWORK/NightCrawler/NightCrawler.py", line 23, in night_crawler_steam
Traceback (most recent call last):
File "<string>", line 1, in <module>
multi_processing(max_pages, url, dirname)
File "C:/Users/��������/Desktop/MY PyWORK/NightCrawler/NightCrawler.py", line 47, in multi_processing
fgt.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 277, in __init__
File "C:\Python27\lib\multiprocessing\forking.py", line 381, in main
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
self = load(from_parent)
File "C:\Python27\lib\pickle.py", line 1384, in load
ForkingPickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
return Unpickler(file).load()
File "C:\Python27\lib\pickle.py", line 864, in load
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
dispatch[key](self)
File "C:\Python27\lib\pickle.py", line 886, in load_eof
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
raise EOFError
save(v)
EOFError
这一直持续到RuntimeError: maximum recursion depth exceeded
任何想法都会有所帮助!
答案 0 :(得分:0)
pickling soup
似乎存在问题(请参阅Programming Guidelines),因此一个简单的解决方案是将调用my_soup(my_url)
移动到目标函数中,如下所示:
def multi_processing(max_pages, url, dirname):
p=Pool() # using a pool is not necessary to fix your problem
for page in xrange(1,max_pages+1):
my_url = str(url) + str(page)
p.apply_async(find_game_titles, (my_url, page, dirname))
p.apply_async(find_links, (my_url, page, dirname))
p.close()
p.join()
def find_links(url,page, dirname):
soup=my_soup(url)
#function body from before
def find_game_titles(url, page, dirname):
soup=my_soup(url)
#function body from before
(当然你也可以以一种可挑选的格式传递汤,但取决于my_soup
确实可能或不值得的东西。)
虽然不是完全必要,但将if __name__=="__main__":
部分放在文件的末尾是正常的。
此外,您还希望查看multiprocessing.Pool其他方法,因为它们可能更合适,具体取决于您的功能。