Question

我在运行时遇到错误。

import requests
from bs4 import BeautifulSoup

url = "http://sport.citifmonline.com/"
url_page_2 = "url" + "2016/10/15/chelsea-3-0-leicester-city-dominant-blues-comfortable-against-champions-photos/"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html5lib")

links = soup.find_all("a")

for link in links:
    print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

g_data = soup.find_all("div", {"class": "wrapper"})

for item in g_data:
    articles = item.content[0].find_all("a", {"class": "cat-box-content"})[0].text
    try:
        print item.contents[1].find_all("h3", {"class": "post-box-title"})[0].text
    except:
        pass

Answer 1

如果您尚未安装html5lib（例如使用pip install html5lib），则无法在未收到错误的情况下使用此解析器。您可以安装它或去对于"html.parser"，而documentation of BeautifulSoup中也提到了 - 只是为了避免任何错误：

soup = BeautifulSoup(r.content, "html.parser")

此外，你的内/后for循环的第一行抛出一个TypeError，因为你试图索引一些不可订阅的东西（因为它不是一个列表或类似的东西，参见eg here了解更多详情）。实际上，它甚至不存在 - 您尝试访问的属性content是None（当然不可订阅）。您应该在每个元素上直接调用find_all：

item.find_all(...)

BeautifulSoup错误

1 个答案: