我正在寻找一种解决方案,可以在不等待答案的情况下发出大量异步网络请求。
这是我目前的代码:
import mechanize
from mechanize._opener import urlopen
from mechanize._form import ParseResponse
from multiprocessing import Pool
brow = mechanize.Browser()
brow.open('https://website.com')
#Login
brow.select_form(nr = 0)
brow.form['username'] = 'user'
brow.form['password'] = 'password'
brow.submit()
while(true):
#async open the browser until some state is fullfilled
brow.open('https://website.com/needthiswebsite')
上面代码的问题是,如果我尝试使两个浏览器开放,则bro2必须等待bro1完成才能启动。 (阻止)
bro1.open('https://website.com/needthiswebsite')
bro2.open('https://website.com/needthiswebsite')
尝试解决方案:
#PSUDO-CODE
#GLOBAL VARIABLE STATE
boolean state = true
while(state):
#async open the browser until some state is full filled
#I spam this function until I get a positive answer from one of the calls
pool = Pool(processes = 1)
result = pool.apply_async(openWebsite,[brow1],callback = updateState)
def openWebsite(browser):
result = browser.open('https://website.com/needthiswebsite')
if result.something() == WHATIWANT:
return true
return false
def updateState(state):
state = true
我试图为我的问题实现类似的解决方案,例如: 有关stackoverflow的Asynchronous method call in Python?问题。
这个问题是我在尝试使用pool.apply_async(brow.open())时遇到错误
错误味精:
PicklingError:无法腌制:属性查找内置 .function失败
我尝试过很多东西来尝试修复PicklingError但似乎没什么用。
任何帮助都会非常感激:)
答案 0 :(得分:1)
mechanize.Browser
对象不是pickleable,因此无法传递给pool.apply_async
(或任何其他需要将对象发送到子进程的方法):
>>> b = mechanize.Browser()
>>> import pickle
>>> pickle.dumps(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
save(stuff)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "/usr/lib/python2.7/pickle.py", line 615, in _batch_appends
save(x)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 725, in save_inst
save(stuff)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle instancemethod objects
最简单的方法是在每个子流程中创建Browser
实例,而不是在父流中创建:
def openWebsite(url):
brow = mechanize.Browser()
brow.open('https://website.com')
#Login
brow.select_form(nr=0)
brow.form['username'] = 'user'
brow.form['password'] = 'password'
brow.submit()
result = brow.open(url)
if result.something() == WHATIWANT:
return True
return False
理想情况下,您只能使用父进程中的Browser
对象登录,然后在多个进程中发出并行请求,但可能需要花费大量精力才能生成对象pickleable(如果可能的话) - 即使你设法删除导致当前错误的instancemethod
对象,除了Browser
之外,{{1}}内还可能有更多不可解决的对象。