我正在尝试从网站的双卷曲{}中的字符串中抓取特定数据。一个人如何设法取出这些数据?以下是该网站的双重卷曲片段:
<div class="swatch-data">
{"thumbnailImageUrl":"https://www.jbl.com.ph/dw/image/v2/AAUJ_PRD/on/demandware.static/-/Sites-masterCatalog_Harman/default/dw367304ef/JBL_Endurance-SPRINT_Product-Image_Black_Front-1605x1605px.jpg?sw=270&sh=330&sm=fit&sfrm=png","productUrl":"https://www.jbl.com.ph/JBL+Endurance+SPRINT.html?cgid=in-ear-headphones&dwvar_JBL%20Endurance%20SPRINT_color=Black-GLOBAL-","productSupportUrl":"","productID":"JBLENDURSPRINTBLK","orderable":false,"availability":{"message":"","status":"NOT_AVAILABLE"},"price":{"unitLabel":"each","priceType":"standard","salesPrice":"N/A"},"realprice":{"salesPrice":"N/A"},"badges":["new"],"buttonText":"Sold Out","showProdLimit":{"status":""},"CTAEnable":true,"commerceSiteFlag":false,"showPromoTimerFlag":false,"isProProd":false}
</div>
谢谢。
编辑: PS。我确实使用BeautifulSoup4,但实际上我只是一个菜鸟,还没有使用JSON。
答案 0 :(得分:3)
带有bs4
的示例import bs4
import json
html = """
<div class="swatch-data">
{"thumbnailImageUrl":"https://www.jbl.com.ph/dw/image/v2/AAUJ_PRD/on/demandware.static/-/Sites-masterCatalog_Harman/default/dw367304ef/JBL_Endurance-SPRINT_Product-Image_Black_Front-1605x1605px.jpg?sw=270&sh=330&sm=fit&sfrm=png","productUrl":"https://www.jbl.com.ph/JBL+Endurance+SPRINT.html?cgid=in-ear-headphones&dwvar_JBL%20Endurance%20SPRINT_color=Black-GLOBAL-","productSupportUrl":"","productID":"JBLENDURSPRINTBLK","orderable":false,"availability":{"message":"","status":"NOT_AVAILABLE"},"price":{"unitLabel":"each","priceType":"standard","salesPrice":"N/A"},"realprice":{"salesPrice":"N/A"},"badges":["new"],"buttonText":"Sold Out","showProdLimit":{"status":""},"CTAEnable":true,"commerceSiteFlag":false,"showPromoTimerFlag":false,"isProProd":false}
</div>
"""
soup=bs4.BeautifulSoup(html,'lxml')
js_data = json.loads(soup.find('div').text)
# if you want productID just get it
print(js_data['productID'])
输出
JBLENDURSPRINTBLK
答案 1 :(得分:1)