从变量Python(urllib2.urlopen)+ Beautifulsoup4打开链接

时间:2015-04-19 08:22:19

标签: python python-2.7 beautifulsoup urllib2

我正在使用Python 2.7 + urllib2 + Beautifulsoup4

当我有字符串时:

soup = BeautifulSoup(urllib2.urlopen('http://www.some-website.com', 'html'))

它完美无缺,但是当我将URl移动到变量时,它无法正常工作。

variable = 'http://www.some-website.com'
soup = BeautifulSoup(urllib2.urlopen(variable, 'html'))

错误:

edit: errcode is: File "C:\Python27\lib\urllib2.py", line 285, in get_type
  raise ValueError, "unknown url type: %s" % self.__original
    ValueError: unknown url type: api/Abc-Abc/def/7/179 –

解决

问题是其中一个链接只是对服务器数据库的引用。

2 个答案:

答案 0 :(得分:1)

# Note: Make sure you add live website like http://vaibhavmule.com not http://some-website.com
variable = 'http://www.some-website.com' # Do not forget 'http' prefix here

# As you used 'html' which is not parser library.
soup = BeautifulSoup(urllib2.urlopen(variable))  

这应该有效。

Reference用于使用解析器库。

答案 1 :(得分:1)

这应该可行。请在此后发布错误

 var='http://www.somesite.com'
 variable = urllib2.urlopen(var).read()
 from BeautifulSoup import BeautifulSoup
 Soup = BeautifulSoup()
 import BeautifulSoup
 soup = Soup(variable)