Question

我正在编写一个程序来解析网页以抢占标题和标题，因此我可以进行SEO咨询而无需手动单击所有代码。

代码有效，但仅返回我要查找的每个标签的单个实例。如果HTML中有5个h1，我只会得到第一个。我如何得到其余的？我在想一个循环，但不确定如何去做。

代码如下：

# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup

#specify URL
quote_page = input('What URL would you like to scrape?')

#query website and return HTML to the variable page
page = urlopen(quote_page)

#parse the HTML with BeautifulSoup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')

#now we have the HTML as soup, so we need to grab the title and headers

title = soup.find('title')
h1s = soup.find('h1')
h2s = soup.find('h2')
h3s = soup.find('h3')
metadescription = soup.find('meta name="description"')


#print out the data in readable format, including "none" for missing data 
#types
print()
print('Title:')
print(title)
print()
print('H1s:')
print(h1s)
print()
print('H2s:')
print(h2s)
print()
print('H3s:')
print(h3s)
print()
print('Description:')
print(metadescription)

Answer 1

使用soup.find_all('h1')来全部获取。

使用URLlib解析HTML-如何打印每个标签中的多个标签？

1 个答案: