Question

我正试图用漂亮的汤做一个网络刮板，但是每次我尝试刮刮该网站时，我什么都收不到。在以下代码中，我使用请求来获取网站，然后将其放入漂亮的汤对象中。之后，我尝试抓取所有标签。

我曾尝试观看youtube教程，并查看了该框架的文档，但我只是不了解如何使用它。

from bs4 import BeautifulSoup
import bs4
import urllib

url = requests.get("https://www.rt.com/")

print(url.status_code)

soup = BeautifulSoup(url.content, 'html.parser')

soup.find_all('div')

Answer 1

您缺少requests软件包，并且对结果没有任何操作。

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.rt.com/")

print(url.status_code)

soup = BeautifulSoup(url.content, 'html.parser')

divs = soup.find_all('div') # save results to a variable

# Print the text inside each div (example of how to use the results)
for div in divs:
    print(div.text)

Answer 2

首先，由于您忘记导入请求包，因此您的代码目前无法使用。因此，一旦您导入了软件包，它就会起作用。

第二，我建议您彻底阅读BeautifulSoup docs。它具有您需要的所有答案。因此，如果您需要该页面上的所有锚点，只需将它们分配给如下所示的变量即可：

 elems = soup.find_all('a')

之后，您可以将其与结果集一起使用，因此，如果需要从锚元素中提取链接，则可以执行以下操作：

for link in elems:
    print(link.get('href'))

# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

如何使用美丽汤收集元素？

2 个答案: