urllib2无法打开网站

时间:2013-12-02 10:56:27

标签: python web-scraping beautifulsoup urllib2 urllib

当我试图打开此链接时 (http://-travka-.tokobagus.com/

urllib2给了我这个错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 2] No such file or directory>

我认为启动连字符/破折号有问题。 我怎么能用urllib2打开这样的网址?

完整代码

import urllib
import urllib2
from bs4 import BeautifulSoup 

url = 'http://-travka-.tokobagus.com/'
#url = 'http://www.google.com'
data = urllib2.urlopen(url)
#soup = BeautifulSoup(data)

您看到我使用google.com而且工作正常。 可能是与版本相关的错误?

我的是:

  • Python - 2.7.4
  • Ubuntu - 13.04

2 个答案:

答案 0 :(得分:0)

将此信息请求添加为答案,因为它在评论中将无法读取。 @ user3037901,您可以为以下命令添加回溯:

import httplib
import urllib2
req = urllib2.Request('http://-travka-.tokobagus.com/')
h = httplib.HTTPConnection(req.get_host())
h.request(req.get_method(), req.get_selector(), req.data, {})

答案 1 :(得分:0)

它对我有用。结果如下:

Python 2.7.5 (v2.7.5:ab05e7dd2788, May 13 2013, 13:18:45) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> stream = urllib2.urlopen("http://-travka-.tokobagus.com/")
>>> response = stream.read()
>>> soup = BeautifulSoup(response)
>>> soup.prettify()
u'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n <head>\n  <meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>\n  <title>\n   HERRY FIRDAUS NST | TOKOBAGUS.COM\n  </title>\n  <link href="http://-travka-.tokobagus.com" rel="canonical"/>\n  <meta content="-travka- telah menjadi member Tokobagus sejak 01-05-2013. Lihat profil -travka- selengkapnya di Tokobagus." name="description"/>\n  <meta content="index,follow" name="robots"/>\n  <link href="http://as.tokobagus.biz/v6/global/images/favicon-13.ico" rel="shortcut icon" type="image/ico"/>\n  <link href="http://as.tokobagus.biz/v6/global/css/global.min.1.0.18.css" media="screen" rel="stylesheet" type="text/css"/>\n  <link href="http://as.tokobagus.biz/v6/skins/default/css/tbl.min.1.0.10.css" media="screen,print" rel="stylesheet" type="text/css"/>\n  <link href="http://as.tokobagus.biz/v6/skins/d