如何从get_attribute_list获取值

时间:2018-05-21 11:47:29

标签: python

import requests

data = requests.get('https://...')

from bs4 import BeautifulSoup
soup = BeautifulSoup(data.text, 'html.parser')

data1 = soup.find('div',{'id':'comparisonTabs1'})

comparisonTabs1 = data1.get_attribute_list('data-js-gtminfo')

结果

['{"Event":"productDetail","EventCategory":"Ecommerce","EventAction":"Product detail","EventLabel":"","Ecommerce":{"Detail":{"ActionField":{"List":"consument/vergelijker"},"Products":[{"Id":3211,"Name":"1 jaar Vast","Price":1600.45,"Brand":"Google","Position":1,"Category":"consument","Variant":"","List":"consument/vergelijker","Dimension10":222.0,"Dimension11":12,"Dimension12":"nee","Dimension13":6.9}]}}}']

我想从Position(= 1),Brand(= Google)和Price(= {{1}获取值}与Selenium和Beautifulsoup。我如何获得这些值?

想在这里提出建议。

1 个答案:

答案 0 :(得分:0)

您已解压 JSON数据,因此解析列表中的每个元素:

import json

comparison_tab_data = json.loads(comparisonTabs1[0])

现在这三个值只是嵌套结构中的键值对:

>>> from pprint import pprint
>>> pprint(comparison_tab_data)
{'Ecommerce': {'Detail': {'ActionField': {'List': 'consument/vergelijker'},
                          'Products': [{'Brand': 'Google',
                                        'Category': 'consument',
                                        'Dimension10': 222.0,
                                        'Dimension11': 12,
                                        'Dimension12': 'nee',
                                        'Dimension13': 6.9,
                                        'Id': 3211,
                                        'List': 'consument/vergelijker',
                                        'Name': '1 jaar Vast',
                                        'Position': 1,
                                        'Price': 1600.45,
                                        'Variant': ''}]}},
 'Event': 'productDetail',
 'EventAction': 'Product detail',
 'EventCategory': 'Ecommerce',
 'EventLabel': ''}

循环遍历嵌套产品列表以提取每个条目的数据:

for entry in comparison_tab_data['Ecommerce']['Detail']['Products']:
    print(entry['Position'], entry['Brand'], entry['Price'])

我不确定您选择使用data1.get_attribute_list()的原因;我希望data-data-js-gtminfo只能是一个字符串;只需使用standard attribute access API

comparison_tab_data = json.loads(data1['data-js-gtminfo'])