Question

我正在尝试从IMDB网页获取链接。内部表有链接但我得到这个错误我不知道如何获取链接我是初学者PLZ帮助

from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top")

var_html  = var_file.read()

var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(tbody={'class': 'lister-list'}):
    for link in item.find_all('a'):
        print(link.get('href'))

我收到此错误

C:\Python27\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was ex
plicitly specified, so I'm using the best available HTML parser for this system
("lxml"). This usually isn't a problem, but if you run this code on another syst
em, or in a different virtual environment, it may use a different parser and beh
ave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))

Answer 1

这只是一个警告，说你没有选择解析器......

而不是

soup = BeautifulSoup(var_html)

尝试：

soup = BeautifulSoup(var_html, "lxml")

Answer 2

使用

soup.find_all(class_='lister-list')

使用BeautifulSoup从IMDB表中提取链接

2 个答案: