Question

我是python的新手，或任何comp语言，但是我试图使用这个代码从网站上抓取一个标题，但是它一直打印“无”，好像标题或任何标签，如果我替换它，不存在。

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = "https://www.roblox.com/catalog/?CatalogContext=1&Keyword=the%20item&SortAggregation=5&LegendExpanded=true&Category=2"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

ttt = page_soup.find("div", {"class":"CatalogItemName notranslate"})
item = ttt.a.text
print(item)

Answer 1

您要查找的内容不在服务器收到的http响应中。一旦页面加载，它就是由javascript生成的。

在执行抓取任务时，您应该始终在浏览器上加载网站而不使用javascript来更好地了解原始html内容的外观。

最后，你可以通过使用像selenium这样的javascript支持的抓取工具来解决这个问题。

Answer 2

当你想使用多个类找到元素时，我认为以下是惯例。

soup.find("div", {'class':['CatalogItemName', 'notranslate']})

为什么我不能在美丽的汤中找到这个标签？

2 个答案: