如何在python中的双花括号内抓取特定数据

时间:2018-06-30 12:39:15

标签: python json

我正在尝试从网站的双卷曲{}中的字符串中抓取特定数据。一个人如何设法取出这些数据?以下是该网站的双重卷曲片段:

<div class="swatch-data">
{"thumbnailImageUrl":"https://www.jbl.com.ph/dw/image/v2/AAUJ_PRD/on/demandware.static/-/Sites-masterCatalog_Harman/default/dw367304ef/JBL_Endurance-SPRINT_Product-Image_Black_Front-1605x1605px.jpg?sw=270&amp;sh=330&amp;sm=fit&amp;sfrm=png","productUrl":"https://www.jbl.com.ph/JBL+Endurance+SPRINT.html?cgid=in-ear-headphones&amp;dwvar_JBL%20Endurance%20SPRINT_color=Black-GLOBAL-","productSupportUrl":"","productID":"JBLENDURSPRINTBLK","orderable":false,"availability":{"message":"","status":"NOT_AVAILABLE"},"price":{"unitLabel":"each","priceType":"standard","salesPrice":"N/A"},"realprice":{"salesPrice":"N/A"},"badges":["new"],"buttonText":"Sold Out","showProdLimit":{"status":""},"CTAEnable":true,"commerceSiteFlag":false,"showPromoTimerFlag":false,"isProProd":false}
</div>

谢谢。

编辑: PS。我确实使用BeautifulSoup4,但实际上我只是一个菜鸟,还没有使用JSON。

2 个答案:

答案 0 :(得分:3)

带有bs4

的示例
import bs4
import json

html = """
<div class="swatch-data">
{"thumbnailImageUrl":"https://www.jbl.com.ph/dw/image/v2/AAUJ_PRD/on/demandware.static/-/Sites-masterCatalog_Harman/default/dw367304ef/JBL_Endurance-SPRINT_Product-Image_Black_Front-1605x1605px.jpg?sw=270&amp;sh=330&amp;sm=fit&amp;sfrm=png","productUrl":"https://www.jbl.com.ph/JBL+Endurance+SPRINT.html?cgid=in-ear-headphones&amp;dwvar_JBL%20Endurance%20SPRINT_color=Black-GLOBAL-","productSupportUrl":"","productID":"JBLENDURSPRINTBLK","orderable":false,"availability":{"message":"","status":"NOT_AVAILABLE"},"price":{"unitLabel":"each","priceType":"standard","salesPrice":"N/A"},"realprice":{"salesPrice":"N/A"},"badges":["new"],"buttonText":"Sold Out","showProdLimit":{"status":""},"CTAEnable":true,"commerceSiteFlag":false,"showPromoTimerFlag":false,"isProProd":false}
</div>
"""

soup=bs4.BeautifulSoup(html,'lxml')
js_data = json.loads(soup.find('div').text)

# if you want productID just get it
print(js_data['productID'])

输出

JBLENDURSPRINTBLK

答案 1 :(得分:1)

您在其中看到的实际上是一个JSON

首先,您需要删除div。建议使用BeautifulSoup

然后,您可以使用json.loads(str)加载字符串。