HTML如下
<div class="carousel">
<div class="carousel_Wrapper">
<div class="carousel_Container swiper-container">
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
</li>
</ul>
</div>
<div class="carousel_NextBtn"></div>
<div class="carousel_PrevBtn"></div>
</div>
</div>
<div class="carousel">
<div class="carousel_Wrapper">
<div class="carousel_Container swiper-container">
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
</li>
</ul>
</div>
<div class="carousel_NextBtn"></div>
<div class="carousel_PrevBtn"></div>
</div>
</div>
我想使用BeautifulSoup更改为HTML,如下所示。
<figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
<p><a href="https://xxxx.jp">other photos</a></p>
<figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
<p><a href="https://xxxx.jp">other photos</a></p>
我正在考虑以下列方式删除不必要的内容。
由于可能还有其他的s,我们正在指定类并执行decoponse(),unwrap()。
html = # First mentioned html
content = BeautifulSoup(html)
content.find('div', class_='carousel_NextBtn').decompose()
content.find('div', class_='carousel').unwrap()
content.find('div', class_='carousel_Wrapper').unwrap()
content.find('div', class_='carousel_Container swiper-container').unwrap()
在应用上述处理时,我认为将生成如下所示的html。
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
</li>
</ul>
<div class="carousel_PrevBtn"></div>
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
</li>
</ul>
<div class="carousel_PrevBtn"></div>
我们认为必要的处理如下所示。
<li>
的第一个<ul>
元素的内容<p><a href="https://xxxx.jp">other photos</a></p>
答案 0 :(得分:0)
html = """<div class="carousel">
<div class="carousel_Wrapper">
<div class="carousel_Container swiper-container">
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
</li>
</ul>
</div>
<div class="carousel_NextBtn"></div>
<div class="carousel_PrevBtn"></div>
</div>
</div>
<div class="carousel">
<div class="carousel_Wrapper">
<div class="carousel_Container swiper-container">
<ul class="swiper-wrapper">
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
</li>
<li class="swiper-slide">
<figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
</li>
</ul>
</div>
<div class="carousel_NextBtn"></div>
<div class="carousel_PrevBtn"></div>
</div>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
all_div = soup.find_all('ul', {'class': 'swiper-wrapper'}) # find all ul tag with specified class
for tag in all_div:
print('-------------------- iteration : ' + str(all_div.index(tag)) + ' --------------------')
print(tag.find('li', {'class': 'swiper-slide'})) # this method works only if your item has class
print(tag.contents[1]) # this method will also work if your item don't have a class
您可以实现“检索每个<li>
的第一个<ul>
元素的内容”的解决方案,如上面的代码所示。你没有遇到第二个问题,所以我还没有发布它。如果您需要任何帮助,请告诉我。