我正在尝试从中抓取一些数据
https://www.bose.com/en_us/locations/?page=1&storesPerPage=10
但无法使用BS4的css selector
。
由于要尝试捕获标记的许多类,因此我使用了soup.select()
函数。我可以使用其他功能轻松地做到这一点,但我很好奇为什么专门使用此功能不起作用。
from bs4 import BeautifulSoup
import requests
url = 'https://www.bose.com/en_us/locations/?page=1&storesPerPage=10'
soup = BeautifulSoup(requests.get(url).content)
soup.select('div.bw__StoreLocation')
# returns []
soup.select('.bw__StoreLocation')
# returns []
但是,当我print(soup)
时,我可以看到.bw__StoreLocation
在其中。
答案 0 :(得分:0)
动态添加数据。根据评论,可以在“网络”标签中找到请求网址。
import requests
params = (
('page', '0'),
('getRankingInfo', 'true'),
('facets/[/]', '*'),
('aroundRadius', 'all'),
('filters', 'domain:bose.brickworksoftware.com AND publishedAt<=1566084972196'),
('esSearch', '''{
"page":0
,"storesPerPage":15
,"domain":"bose.brickworksoftware.com"
,"locale":"en_US"
,"must":[{"type":"range","field":"published_at","value":{"lte":1566084972196}}]
,"filters":[]
,"aroundLatLngViaIP":"True"
}'''
),
('aroundLatLngViaIP', 'true'),
)
r = requests.get('https://bose.brickworksoftware.com/locations_search', params=params).json()
data = r['hits'][0]['attributes']
address = ', '.join([data['address1'] , data['city'], data['countryCode'], data['postalCode']])
print(address)