Question

@ayivima在这里有一个很好的答案，但是我要补充一点，由于网站本身包含大量Javascript，因此最终并未被BeautifulSoup正确废弃。

所以我对使用Python完全陌生，我只是想打印网页标题。我主要从Google使用此代码：

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210001601"
page = requests.get(url)
data = page.text
soup = BeautifulSoup
soup.find_all('h1')

print(text)

我不断收到错误消息：

AttributeError: 'str' object has no attribute 'descendants'

老实说，我真的不知道这意味着什么，我唯一能找到的其他答案是：AttributeError: 'str' object has no attribute 'descendants'，我认为这不适合我吗？

我在代码中做错了什么吗？（可能很多，但是我主要是指这个错误）

Answer 1

BeautifulSoup需要一个html解析器，并且html文本作为属性传递。从技术上讲，您需要创建BeautifulSoup的实例。如果您不传递html文本，将没有任何搜索内容。

因此soup = BeautifulSoup行必须变成这样：

soup = BeautifulSoup(data, 'html.parser')

其中第一个参数（在这种情况下为data指的是原始html文本），第二个参数为解析器html.parser。我正在使用默认的python html解析器，但是python除了支持其他解析器。在此处了解更多信息：https://www.crummy.com/software/BeautifulSoup/bs4/doc/。

推荐代码：

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210001601"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data, 'html.parser')
text = soup.find_all('h1')

print(text)

输出：

[]

BeautifulSoup似乎没有找到任何h1标签。

让我们尝试使用meta标签：

meta_tags = soup.find_all('meta')
print(meta_tags)

输出：

[<meta content="no-cache" http-equiv="Pragma"/>, 
<meta content="-1" http-equiv="Expires"/>, 
<meta content="no-cache" http-equiv="CacheControl"/>]

AttributeError：使用BeautifulSoup时，“ str”对象没有属性“ descendants”错误

1 个答案: