Question

我正在尝试抓取网站的元描述内容。

示例：

<meta name="description" content="This is the home page meta description.">

我正在寻找的输出是：“ 这是主页元描述。”

我的代码是：

raw_html = simple_get(companyUrl)
html = BeautifulSoup(raw_html, 'html.parser')
x = html.select('meta', {'name' : 'description'})  ## this line errors out

有人可以指出我正确的方向吗？

（也是-这是我的想象力，还是BeautifulSoup教程/文档没有达到其他语言/应用程序的水平？）

Answer 1

使用BeautifulSoup

from bs4 import BeautifulSoup

html = """<meta name="description" content="This is the home page meta description.">"""

soup = BeautifulSoup(html, 'html.parser')
content = soup.find('meta', {'name':'description'}).get('content')
print(content)

STDOUT：

这是主页的元描述。

Answer 2

您必须像这样使用css选择器：

x = html.select('meta[name="description"]')
print(x[0].attrs["content"])

在此处了解有关CSS选择器的更多信息：https://www.w3schools.com/css/css_attribute_selectors.asp

BeautifulSoup：如何抓取元标记描述内容

2 个答案: