Question

我是初学者，想要问如何使用漂亮的汤从以下类型的代码中提取数据：

<div class="about-book" id="aboutbook">
Blah blah blah
</div>

如何获得＆＃34; Blah blah blah＆＃34;当有＆＃34; about-book＆＃34;使用不同的ID和＆＃34; aboutbook＆＃34;使用不同的类名。我想要的是类名和id的组合。

Answer 1

from bs4 import BeautifulSoup

soup = BeautifulSoup("""<div class="about-book" id="aboutbook">
Blah blah blah
</div>""")

print([x.text for x in soup.find_all("div",attrs={"class":"about-book","id":"aboutbook"})])
[u'\nBlah blah blah\n']

如果只有一个：

  print(soup.find("div",attrs={"class":"about-book","id":"aboutbook"}).text)

Answer 2

尝试"div#aboutbook.about-book"

使用beautifulsoup，您可以这样写：

soup = BeautifulSoup(html) soup.find_all("div", class_="about-book", id="aboutbook")

Answer 3

要在 BeautifulSoup 中使用 class 或 id 抓取数据，使用 ProxyCrawl’s 内置库是理想的选择，因为它们具有多种编程语言支持、预定义库、功能等。您可以轻松自定义您选择的参数并完全按照您的要求抓取数据。现在，您可以使用以下代码。

<块引用>

源代码

from bs4 import BeautifulSoup
import requests
url = "https://github.com/"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
print(soup.title)
title = soup.find_all(class_="outer-text")
for i in title:
    print(i.text)

des = soup.find_all(id="first")
for j in des:
    print(j.text)

美丽的汤div与类和id都

3 个答案: