我在 label
中有这个带有嵌套 HTML 的 json 响应(通过 python 返回):
{
"point": {
"lat": "27.938829",
"long": "-82.322109"
},
"label": "<div class=\"lifestyle-results_item lifestyle-results_item-b\"><div class=\"locationinfo_area\"><h4 style=\"width:230px;\">At Petco</h4><h3 class=\"dealer_cat\"></h3><address style=\"left:240px\">2434 West Brandon Boulevard<br />Brandon, FL 33511<br /><br />813-571-0120<div class=\"contentinfo_area_operated\"></div></address></div><div class=\"contentinfo_area\"><div class=\"contentinfo_area_zip\">4.2 mi from Zip Code 33584</div></div><a href=\"https://vetcoclinics.petco.com?store_number=PET2722&source=vetcoclinics\" target=\"_blank\"><div class=\"contentinfo_area\"><div class=\"contentinfo_area_reserve\">BOOK NOW</div></div></a><div class=\"timeinfo_area\"><b>Sun, May 9</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 16</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 23</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 30</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, June 6</b><br/> at 10:00 AM - 1:00 PM<br /></div></div>",
"title": "At Petco ",
"html": "<div class=\"googlemap_bubble\"><b>At Petco<br><span></span></b><br />2434 West Brandon Boulevard<br />Brandon, FL 33511<br />813-571-0120<br /></div>"
}
如何使用正则表达式从 label
中提取打击:
答案 0 :(得分:1)
我建议您避免使用正则表达式并坚持使用 Beautiful Soup
进行 HTML 解析假设您将 JSON 数据保存在名为 data
的变量中,您可以执行以下操作:
from bs4 import BeautifulSoup
htmlData = data["label"]
soup = BeautifulSoup(htmlData, 'html.parser')
address = soup.address.string
link = soup.a.get('href')
然后您可以使用简单的拆分来获取 address
变量中的附加数据:
addressParts = address.split("<br />")
并使用 url 解析器从 store_number
变量中获取 link
参数:
from urllib import parse
storeName = parse.parse_qs(parse.urlparse(link).query)['store_name'][0]
你会得到
addressParts
["2434 West Brandon Boulevard", "Brandon, FL 33511", "", "813-571-0120"]
列表
link
变量包含
https://vetcoclinics.petco.com?store_number=PET2722&source=vetcoclinics\
storeName
的 PET2722
变量