我正在编写一个程序来解析网页以抢占标题和标题,因此我可以进行SEO咨询而无需手动单击所有代码。
代码有效,但仅返回我要查找的每个标签的单个实例。如果HTML中有5个h1,我只会得到第一个。我如何得到其余的?我在想一个循环,但不确定如何去做。
代码如下:
# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
#specify URL
quote_page = input('What URL would you like to scrape?')
#query website and return HTML to the variable page
page = urlopen(quote_page)
#parse the HTML with BeautifulSoup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
#now we have the HTML as soup, so we need to grab the title and headers
title = soup.find('title')
h1s = soup.find('h1')
h2s = soup.find('h2')
h3s = soup.find('h3')
metadescription = soup.find('meta name="description"')
#print out the data in readable format, including "none" for missing data
#types
print()
print('Title:')
print(title)
print()
print('H1s:')
print(h1s)
print()
print('H2s:')
print(h2s)
print()
print('H3s:')
print(h3s)
print()
print('Description:')
print(metadescription)
答案 0 :(得分:0)
使用soup.find_all('h1')
来全部获取。