dryscrape - 在第一个

时间:2015-12-11 14:46:27

标签: python session web-scraping capybara-webkit xvfb

我使用dryscrape从不同的页面中抓取一些HTML数据。它是django应用程序的所有部分,但我发现在使用python shell时也出现了这个问题。第二次连接问题。我正在使用:

Python 2.7.6 (default, Mar  4 2014, 13:14:52) 
dryscrape Version: 0.9
webkit-server Version: 1.0
xvfbwrapper Version: 0.2.5

下面你可以看到我想用它的方式

Python 2.7.6 (default, Mar  4 2014, 13:14:52) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import dryscrape

In [2]: from xvfbwrapper import Xvfb

In [3]: x = Xvfb()

In [4]: x.start()

In [5]: session = dryscrape.Session(base_url='http://google.com')

In [6]: session.visit('')

In [7]: session.url()
Out[7]: u'http://www.google.pl/?gfe_rd=cr&ei=d95qVvLfFc2v8wfamoG4Aw'

In [8]: x.stop()

现在一切都很好。但是,如果我尝试继续,另一个会议

...
In [8]: x.stop()

In [9]: x = Xvfb()

In [10]: x.start()

In [11]: session = dryscrape.Session(base_url='http://google.com')
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-11-6cbe39a8459d> in <module>()
----> 1 session = dryscrape.Session(base_url='http://google.com')

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/dryscrape/session.pyc in __init__(self, driver, base_url)
     16                driver = None,
     17                base_url = None):
---> 18     self.driver = driver or DefaultDriver()
     19     self.base_url = base_url
     20 

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/dryscrape/driver/webkit.pyc in __init__(self, **kw)
     28   def __init__(self, **kw):
     29     kw.setdefault('node_factory_class', NodeFactory)
---> 30     super(Driver, self).__init__(**kw)

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in __init__(self, connection, node_factory_class)
    228                node_factory_class = NodeFactory):
    229     super(Client, self).__init__()
--> 230     self.conn = connection or ServerConnection()
    231     self._node_factory = node_factory_class(self)
    232 

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in __init__(self, server)
    505   def __init__(self, server = None):
    506     super(ServerConnection, self).__init__()
--> 507     self._sock = (server or get_default_server()).connect()
    508     self.buf = SocketBuffer(self._sock)
    509     self.issue_command("IgnoreSslErrors")

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in connect(self)
    438     """ Returns a new socket connection to this server. """
    439     sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
--> 440     sock.connect(("127.0.0.1", self._port))
    441     return sock
    442 

/usr/local/lib/python2.7/socket.pyc in meth(name, self, *args)
    222 
    223 def meth(name,self,*args):
--> 224     return getattr(self._sock,name)(*args)
    225 
    226 for _m in _socketmethods:

error: [Errno 111] Connection refused

我这样做只是为了举例,因为在我的django应用程序中它是视图逻辑的一部分,并且第二次请求该视图会导致此错误。重新启动django服务器或python shell解决了它,但仅限于第一次连接,因此对于工作网页来说它是无用的。我错过了一些&#34;清洁&#34;或者&#34;重启&#34;这两个之间的X会话或webkit-server(capibara-webkit)?

1 个答案:

答案 0 :(得分:0)

好吧,这不是一个“真正的”答案,因为我仍然不知道出了什么问题,但我找到了一种方法让这个有效。我已将dryscrape升级到1.0并使用了新方法dryscrape.start_xvfb()而不是xvfbwrapper Xvfb()。一切都很好。