我想抓一个与
类似结构的网站内容https://www.wellstar.org/locations/pages/default.aspx
使用提供的网站作为框架,我想提取位置的名称和与该位置相关联的标题。我希望能够产生以下内容:
WellStar Hospitals
威尔斯塔亚特兰大医疗中心
WellStar Hospitals
威尔斯顿亚特兰大医疗中心南部
...
WellStar Health Parks
ACWORTH HEALTH PARK
...
到目前为止,我尝试了一个嵌套的for循环:
for type in soup.find_all("h3",class_="WebFont SpotBodyGreen"):
for name in soup.find_all("div",class_="PurpleBackgroundHeading"):
print(type.text, name.text)
上述for loop
会返回重复项,因为每个名称都与每种类型配对,无论网站上有什么内容。无论是以代码形式和/或推荐的资源来处理这项任务的任何帮助将不胜感激。
答案 0 :(得分:1)
您需要一种按名称对地点进行分组的方法。为此,我们将每个块分开,将标题和位置收集到字典中:
from pprint import pprint
import requests
from bs4 import BeautifulSoup
url = "https://www.wellstar.org/locations/pages/default.aspx"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
d = {}
for row in soup.select(".WS_Content > .WS_LeftContent > table > tr"):
title = row.h3.get_text(strip=True)
d[title] = [item.get_text(strip=True) for item in row.select(".PurpleBackgroundHeading a")]
pprint(d)
打印(漂亮打印pprint()
):
{'WellStar Community Hospice': ['Tranquility at Cobb Hospital',
'Tranquility at Kennesaw Mountain'],
'WellStar Health Parks': ['Acworth Health Park', 'East Cobb Health Park'],
'WellStar Hospitals': ['WellStar Atlanta Medical Center',
'WellStar Atlanta Medical Center South',
'WellStar Cobb Hospital',
'WellStar Douglas Hospital',
'WellStar Kennestone Hospital',
'WellStar North Fulton Hospital',
'WellStar Paulding Hospital',
'WellStar Spalding Regional Hospital',
'WellStar Sylvan Grove Hospital',
'WellStar West Georgia Medical Center',
'WellStar Windy Hill Hospital'],
'WellStar Urgent Care Centers': ['WellStar Urgent Care in Acworth',
'WellStar Urgent Care in Kennesaw',
'WellStar Urgent Care in Marietta - Delk '
'Road',
'WellStar Urgent Care in Marietta - East '
'Cobb',
'WellStar Urgent Care in Marietta - '
'Kennestone',
'WellStar Urgent Care in Marietta - Sandy '
'Plains Road',
'WellStar Urgent Care in Smyrna',
'WellStar Urgent Care in Woodstock']}