我在使用带有python 2.7的lxml时遇到了问题。我尝试安装lxml版本3.4.0和3.4.2但得到相同的错误不明白为什么。这是我的python代码:
@app.route("/getInformation", methods=['GET'])
def domain():
urlList = []
urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57109")
urlList.append("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
date = '2015-04-18'
# use this in real mode: currentDate = (time.strftime("%Y-%m-%d"))
homeScore = "0"
awayScore = "0"
homeTeam = ""
awayTeam = ""
time_xpath = XPath("td[1]/span/span//text()[2]")
team_xpath = XPath("td[2]/a/text()")
league_xpath = XPath("//*[@id='content-primary']/h1//text()")
for url in urlList:
test = 0 #remove this
rows_xpath = XPath("//*[@id='content-primary']/table/tbody/tr[td[1]/span/span//text()='%s']" % (date))
html = lxml.html.parse(url)
....
这是我得到的错误:
2015-03-28 13:12:23,852 :Exception on /getInformation [GET]
Traceback (most recent call last):
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/Timocin/mysite/work.py", line 48, in domain
html = lxml.html.parse(url)
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 786, in parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "lxml.etree.pyx", line 3299, in lxml.etree.parse (src/lxml/lxml.etree.c:72655)
File "parser.pxi", line 1791, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:106263)
File "parser.pxi", line 1817, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:106564)
File "parser.pxi", line 1721, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:105561)
File "parser.pxi", line 1122, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:100456)
File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:94543)
File "parser.pxi", line 690, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:96003)
File "parser.pxi", line 618, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:95015)
IOError: Error reading file 'http://gbgfotboll.se/serier/?scr=table&ftid=57109': failed to load HTTP resource
我在另一个上使用相同的代码并且工作正常。但现在我改变了服务器我不知道为什么我得到这个错误。有什么想法吗?
EDIT1
我尝试使用urllib2代替,我收到以下消息:
2015-03-28 15:15:05,087 :Exception on /getInformation [GET]
Traceback (most recent call last):
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/home/Timocin/mysite/env/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/Timocin/mysite/work.py", line 57, in domain
p = urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
EDIT2
我最近发现您需要有一个付费帐户才能访问外部网站。我买了一个帐户但它仍然无法工作,但错误消息已更改,对于lxml:
IOError: Error reading file 'https://gbgfotboll.se/serier/?scr=table&ftid=57109': failed to load external entity "https://gbgfotboll.se/serier/?scr=table&ftid=57109"
和urllib2
URLError: <urlopen error [Errno 111] Connection refused>