获得数据产品元素与美丽的汤

时间:2018-03-30 15:29:13

标签: python beautifulsoup

我正在尝试使用Beautiful Soup从网站获取数据。我有这部分代码,我想在数据产品元素中获取JSON部分。 我怎么能这样做?

此代码:

soup_catalog.find('a',class_="product-li")

返回:

<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>

然后我尝试了:

soup_catalog.find('a',class_="product-li").find('data-product')

但数据产品未被退回。 我怎么能这样做?

2 个答案:

答案 0 :(得分:1)

这应该有帮助

from bs4 import BeautifulSoup

s = """<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>"""
soup = BeautifulSoup(s, "html.parser")
i = soup.find("a",class_="product-li")
print(i["data-product"])

<强>输出:

{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}

答案 1 :(得分:1)

您可以从标记的属性中获取数据,如下所示:

soup_catalog.find('a',class_='product-li').get('data-provider')