我正在搜索包含如下所示字典的文本字符串:
soup_string =“”“ {” loadType“:”“,” shiftId“:” ROVR-DUMMY-SHIFTID“,” carbonFriendly“:” no“,” cost“:”£2.00“,” initialSlotPrice“:” “,” timeSlotISO“:” 2019-06-13T12:00 + 01:00/13:00 + 01:00“,” isSameDayPremium“:” false“,” stopId“:” 10446315588190612134701380“,” availability“:”完整“,” slotDiscountedByDP“:”假“,” slotId“:” 1hr-12-13-20190613“,”时间“:” 12:00 pm-1:00 pm“,” rawSlotPrice“:”“,” slotDiscounted“:”假”}, {“ loadType”:“”,“ shiftId”:“ ROVR-DUMMY-SHIFTID”,“ carbonFriendly”:“ no”,“ cost”:“£2.00”,“ initialSlotPrice”:“”,“ timeSlotISO”:“ 2019 -06-13T12:30 + 01:00/13:30 + 01:00“,” isSameDayPremium“:” false“,” stopId“:” 10446315588190612134701380“,” availability“:” available“,” slotDiscountedByDP“:” false “,” slotId“:” 1hr-12:30-13:30-20190613“,” time“:” 12:30 pm-1:30 pm“,” rawSlotPrice“:”“,” slotDiscounted“:” false“}” “”
我希望返回“字典”中每个键后面的字符串。
我已经决定一种合适的方法是使用Regex表达式。我可以使用
退回每次费用Costs = re.findall(r"\£[0-9]\.[0-9][0-9]", soup_string)
times = re.findall(r'\"(time)\"\:\"(.{14,16})\"\,', soup_string)
基本上,我希望能够在字典中查找每个键,并搜索特定的字符串,然后返回该值。
最终目标是使用'Cost', 'Availability' and 'time'
创建字典。
完整代码:
import requests
from bs4 import BeautifulSoup
import json
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
soup = BeautifulSoup(r2.text, 'html.parser')
soup_string = str(soup)
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]```
答案 0 :(得分:0)
首先,您需要将数据放入列表中,并使用key:data创建一个字典。 (请参见下面的示例)。然后使用json将其转换为词典字典。然后循环提取每个字典的成本,可用性和时间。
import json
soup_string = """{"data": [{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:00+01:00/13:00+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"full","slotDiscountedByDP":"false","slotId":"1hr-12-13-20190613","time":"12:00pm - 1:00pm","rawSlotPrice":"","slotDiscounted":"false"}, {"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:30+01:00/13:30+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"available","slotDiscountedByDP":"false","slotId":"1hr-12:30-13:30-20190613","time":"12:30pm - 1:30pm","rawSlotPrice":"","slotDiscounted":"false"}]}"""
d = json.loads(soup_string)
result = []
cost, avail, time = [], [], []
for data in d['data']:
tmp = {}
tmp['Cost'] = data['cost']
tmp['Availability'] = data['availability']
tmp['Time'] = data['time']
result.append(tmp)
result
Output:
[{'Cost': '£2.00', 'Availability': 'full', 'Time': '12:00pm - 1:00pm'},
{'Cost': '£2.00', 'Availability': 'available', 'Time': '12:30pm - 1:30pm'}]
答案 1 :(得分:0)
鉴于您有多个词典,我不确定您要获取的内容,但是据我了解,这应该会有所帮助:
import json
soup_string = ''' ... ''' # As it is in the question
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]
它将字典字符串视为JSON字典列表,并使用json
模块对其进行解析。然后,它将过滤掉除所需键/值对之外的所有内容。结果是已过滤字典的列表。
输出(即filtered
的值):
[
{'cost': '£2.00', 'availability': 'full', 'time': '12:00pm - 1:00pm'},
{'cost': '£2.00', 'availability': 'available', 'time': '12:30pm - 1:30pm'}
]
编辑:
作为对您提供代码的回应,我看到您正在对BeautifulSoup的结果调用str
。除了这样做,您还可以直接处理client.get()
结果:
import json
import requests
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
dicts = r2.json()['slotHeader'][0]['slots']
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:d[k] for k in keep_keys} for d in dicts]