找不到使用BS4的CSS选择器

时间:2019-08-17 21:19:11

标签: css web-scraping beautifulsoup css-selectors

我正在尝试从中抓取一些数据 https://www.bose.com/en_us/locations/?page=1&storesPerPage=10 但无法使用BS4的css selector

由于要尝试捕获标记的许多类,因此我使用了soup.select()函数。我可以使用其他功能轻松地做到这一点,但我很好奇为什么专门使用此功能不起作用。

enter image description here

from bs4 import BeautifulSoup
import requests

url = 'https://www.bose.com/en_us/locations/?page=1&storesPerPage=10'
soup = BeautifulSoup(requests.get(url).content)

soup.select('div.bw__StoreLocation')
# returns []

soup.select('.bw__StoreLocation')
# returns []

但是,当我print(soup)时,我可以看到.bw__StoreLocation在其中。

1 个答案:

答案 0 :(得分:0)

动态添加数据。根据评论,可以在“网络”标签中找到请求网址。

import requests

params = (
    ('page', '0'),
    ('getRankingInfo', 'true'),
    ('facets/[/]', '*'),
    ('aroundRadius', 'all'),
    ('filters', 'domain:bose.brickworksoftware.com AND publishedAt<=1566084972196'),
    ('esSearch', '''{
                    "page":0
                    ,"storesPerPage":15
                    ,"domain":"bose.brickworksoftware.com"
                    ,"locale":"en_US"
                    ,"must":[{"type":"range","field":"published_at","value":{"lte":1566084972196}}]
                    ,"filters":[]
                    ,"aroundLatLngViaIP":"True"
                    }'''
    ),
    ('aroundLatLngViaIP', 'true'),
)

r = requests.get('https://bose.brickworksoftware.com/locations_search',  params=params).json()
data = r['hits'][0]['attributes']
address  = ', '.join([data['address1'] , data['city'], data['countryCode'], data['postalCode']])
print(address)