使用Python和beautifulsoup,我需要帮助同时从父div和子div中提取信息。
这是第一个示例代码:
<div id="slide-609becd056bb40a7ad42607a4d1c67f5"
class="slide has-link slick-slide"
data-label="April 2 2018 Acura TLX Offer 2000x700.jpg"
data-link="/new-inventory/index.htm?model=TLX&year=2018" data-target="_self"
style="background-image: url("https://pictures.dealer.com/a/adw/0877/5eabcb338dc604c09b28a4df5a49ad78x.jpg?impolicy=resize&h=514");
width: 1897px; position: relative; left: 0px; top: 0px; z-index: 998; opacity: 0; height: 514px; transition: opacity 750ms ease;" data-slick-index="0" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide00">
以下是示例代码2:
<div id="slide-7ae8b29ddc9e45d1a219beffe5793b2b"
class="html-slide slide slick-slide"
data-label="March-Madness.jpg" data-link="" data-target=""
data-promo-id="" data-slick-index="2" aria-hidden="true" tabindex="-1" role="option"
aria-describedby="slick-slide02"
style="width: 1897px; position: relative; left: -3794px; top: 0px; z-index: 998; opacity: 0; height: 514px; transition: opacity 750ms ease;">
<div class="slide-background"
style="background-image: linear-gradient(rgba(0, 0, 0, 0), rgba(0, 0, 0, 0)), url("https://pictures.dealer.com/g/goodsonacuraofdallasadw/1747/13ed067a023df8ad412feea2c6eddec9x.jpg?impolicy=resize&h=514"); height: 514px;">
<img src="https://pictures.dealer.com/g/goodsonacuraofdallasadw/1747/13ed067a023df8ad412feea2c6eddec9x.jpg?impolicy=resize&h=514" class="placeholder-image pull-left"> </div>
我需要从两个代码示例中获取style
元素,以便我可以获取背景图片网址。问题是第一个代码在父div中有style
,第二个代码在子div中有style
。如何使用Python和beautifulsoup同时获取这两个style
元素?
以下是我尝试的代码:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.goodsonacura.com/'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
banner_info = page_soup.findAll('div',{'class':['slide has-link', 'html-slide slide has-link']})
picture = [banner.get('style') for banner in banner_info]
此代码为第一个示例代码提供了正确的style
元素,但它为第二个示例代码提供了错误的style
元素。
答案 0 :(得分:0)
在find_all
查询中添加“slide-background”类。请参阅以下示例: -
banner_info = page_soup.find_all('div',{'class':['slide has-link', 'html-slide slide has-link', 'slide-background']})
它对我有用。愿这对你有帮助。