import requests
data = requests.get('https://...')
from bs4 import BeautifulSoup
soup = BeautifulSoup(data.text, 'html.parser')
data1 = soup.find('div',{'id':'comparisonTabs1'})
comparisonTabs1 = data1.get_attribute_list('data-js-gtminfo')
结果
['{"Event":"productDetail","EventCategory":"Ecommerce","EventAction":"Product detail","EventLabel":"","Ecommerce":{"Detail":{"ActionField":{"List":"consument/vergelijker"},"Products":[{"Id":3211,"Name":"1 jaar Vast","Price":1600.45,"Brand":"Google","Position":1,"Category":"consument","Variant":"","List":"consument/vergelijker","Dimension10":222.0,"Dimension11":12,"Dimension12":"nee","Dimension13":6.9}]}}}']
我想从Position
(= 1
),Brand
(= Google
)和Price
(= {{1}获取值}与Selenium和Beautifulsoup。我如何获得这些值?
想在这里提出建议。
答案 0 :(得分:0)
您已解压 JSON数据,因此解析列表中的每个元素:
import json
comparison_tab_data = json.loads(comparisonTabs1[0])
现在这三个值只是嵌套结构中的键值对:
>>> from pprint import pprint
>>> pprint(comparison_tab_data)
{'Ecommerce': {'Detail': {'ActionField': {'List': 'consument/vergelijker'},
'Products': [{'Brand': 'Google',
'Category': 'consument',
'Dimension10': 222.0,
'Dimension11': 12,
'Dimension12': 'nee',
'Dimension13': 6.9,
'Id': 3211,
'List': 'consument/vergelijker',
'Name': '1 jaar Vast',
'Position': 1,
'Price': 1600.45,
'Variant': ''}]}},
'Event': 'productDetail',
'EventAction': 'Product detail',
'EventCategory': 'Ecommerce',
'EventLabel': ''}
循环遍历嵌套产品列表以提取每个条目的数据:
for entry in comparison_tab_data['Ecommerce']['Detail']['Products']:
print(entry['Position'], entry['Brand'], entry['Price'])
我不确定您选择使用data1.get_attribute_list()
的原因;我希望data-data-js-gtminfo
只能是一个字符串;只需使用standard attribute access API:
comparison_tab_data = json.loads(data1['data-js-gtminfo'])