尝试重新启动Web scraper的线程时连接被拒绝

时间:2014-11-16 18:30:14

标签: python multithreading python-2.7 webkit

我正在使用DryScrape来抓取一个javascript页面,如果出现错误,它偶尔会杀死进程。我已经尝试过根据文档使用catch来阻止它,但我还没弄明白:

        try:
            sess.visit('url'))
        except webkit_server.EndOfStreamError:
            continue
        except webkit_server.NoResponeerror:
            continue
        except webkit_server.InvalidResponseError:
            continue
        except webkit_server.NoX11Error:
            continue

所以我有一个像这样的设置,如果它们崩溃重启线程: class Checker():     def check_if_thread_is_alive(self):         a = ThreadClass()         a.start()

    b = ThreadClass()
    b.start()

    c = ThreadClass()
    c.start()

    d = ThreadClass()
    d.start()

    while True:
        if not a.is_alive():
            print "Restarting A"
            a = ThreadClass()
            a.start()
        if not b.is_alive():
            print "Restarting B"
            b = ThreadClass()
            b.start()
        if not c.is_alive():
            print "Restarting C"
            c = ThreadClass()
            c.start()
        if not d.is_alive():
            print "Restarting D"
            d = ThreadClass()
            d.start()

但是,每当我尝试重新启动一个线程时,我最终都会收到错误:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "Scrapper.py", line 30, in run
    sess = dryscrape.Session(base_url = 'url')
  File "/usr/local/lib/python2.7/dist-packages/dryscrape/session.py", line 18, in __init__
    self.driver = driver or DefaultDriver()
  File "/usr/local/lib/python2.7/dist-packages/dryscrape/driver/webkit.py", line 30, in __init__
    super(Driver, self).__init__(**kw)
  File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 225, in __init__
    self.conn = connection or ServerConnection()
  File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 444, in __init__
    self._sock = (server or get_default_server()).connect()
  File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect
    sock.connect(("127.0.0.1", self._port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

有没有更好的方法来尝试解决这个问题,或者是我遗漏的东西?

2 个答案:

答案 0 :(得分:4)

Cos:你正在尝试连接自己。

need change target url.

如果想要连接到自己,请先创建服务。

File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect
    sock.connect(("127.0.0.1", self._port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth <<<--- you're trying to connect to yourself.
    return getattr(self._sock,name)(*args)

答案 1 :(得分:1)

如果要跳过异常,可以始终使用像这样的catch-all异常处理程序。这通常被认为是非常糟糕的实践,但如果只是偶尔发生错误,它会让你的刮刀继续运行:

try:
    sess.visit(url)
except Exception as e:
    # Print the exception for debugging here
    continue

您是否正在启动本地服务器进行测试?从追溯:

File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect
sock.connect(("127.0.0.1", self._port))

您实际上是在连接到localhost。如果您启动自己的服务器,请检查服务器日志以查看它停止响应连接请求的原因。

刚刚在你的剧本中发现了另一个错误:

sess.visit('url')
# it should be something like:
url = "http://www.google.com/"
sess.visit(url)