如何从<script>中没有属性的<script>中获取数据?

时间:2018-10-11 21:24:46

标签: python beautifulsoup

我正在尝试使用漂亮的汤提取餐厅位置的所有坐标。如何从人体下面的script标签中提取所有坐标?

来自bs4的
 导入BeautifulSoup作为bs
汇入要求
导入urllib2
导入json

base_url ='https://locations.wafflehouse.com/'
r = request.get(base_url)
汤= bs(r.text,'html.parser')
all_scripts = soup.find_all('script')
打印all_scripts [19]
 

1 个答案:

答案 0 :(得分:0)

更新的答案: 您需要解析json.loads()中的json,然后进行导航,尝试使用此代码...工作顺利!

import json, requests
from bs4 import BeautifulSoup
req = requests.get('https://locations.wafflehouse.com/')
soup = BeautifulSoup(req.content, 'html.parser')
data = soup.find_all('script')[19].text.encode('utf-8')
jdata = data.replace('window.__SLS_REDUX_STATE__ =', '').replace(';', '')
data = json.loads(jdata)
for i in data['dataLocations']['collection']['features']:
    LatLong = (i['geometry']['coordinates'])
    print(LatLong)

输出:

[-90.073113, 30.37019]
[-84.131085, 33.952944]
[-78.719497, 36.14261]
[-95.629084, 29.947421]
[-83.9019, 33.56531]
[-80.091552, 37.288422]
[-77.949231, 34.237534]
[-96.60637, 32.968131]
[-80.969088, 29.151235]
[-86.843386, 33.354666]
[-84.206, 33.462175]
[-76.342464, 36.830187]
[-79.985822, 32.898412]
[-84.2784568595722, 33.
[-88.780694, 35.674914]
[-87.898899, 30.598605]
[-83.71487, 32.614092]
[-79.523611, 36.07101]
[-91.127792, 30.580582]
[-86.352681, 35.875097]
[-90.271372, 30.023002]
[-80.205641, 25.955672]
[-81.632, 30.157]
[-86.961821, 31.454352]
[-80.666906, 35.366769]
[-97.56596, 35.406447]
[-84.364334, 35.511474]
[-81.01622, 29.23453]
[-86.57177, 34.855504]
[-84.625908, 33.399829]
[-76.344303, 36.740862]
[-84.192634, 33.517948]
[-77.83421, 39.296024]
[-77.518985, 38.359332]
[-84.45238, 38.042061]
[-83.08319, 39.840191]
[-81.993971, 33.475816]
[-95.481102, 29.913294]
[-82.699, 28.334]
[-84.352035, 33.989889]
[-86.819468, 35.945115]
[-91.009638, 30.407864]
[-81.8428, 27.9096]