Question

我寻求帮助，因为我被困在如何抓取网页中的每个链接（页面或子页面）并查找任何单词的频率。我用了美味的汤为了刮，但我不这么认为我做得对。例如：我需要去服务现在官方页面＆gt;解决方案＆gt;查看所有解决方案在查看所有解决方案下的所有链接/子页面中查找“智能”的频率。任何帮助将非常感谢。谢谢:)）

我的代码

import requests
from bs4 import BeautifulSoup

url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')

print(sNow_soup.find_all('href',{'class':'cta-list component'}))


for name in sNow_soup.find_all('href',{'class':'cta-list component'}):
    print(name.text)

Answer 1

这是您访问页面中每个链接的href属性所需的内容。

import requests
from bs4 import BeautifulSoup

url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')

for anchor in sNow_soup.find_all('a', href=True):
    print(anchor['href'])

Answer 2

您正在搜索href标记。这是错的！

您应该搜索a代码，然后获取href属性。这是链接页面的网址。

使用Python搜索网页子页面中单词的频率

2 个答案: