我需要使用BeautifulSoup find()方法从下面的HTML文本中提取电影标题和年份。
以下返回电影的名称,但我无法仅返回年份
find('p')。find('a')。text
<div class="col-sm-6 col-lg-3">
<div class="poster-container">
<a class="poster-link" href="/title/80244680/">
<img alt="A Tale of Two Kitchens (2019)" class="poster" src="https://occ-0-37-33.1.nflxso.net/dnm/api/v6/0DW6CdE4gYtYx8iy3aj8gs9WtXE/AAAABfTGUtIG2HYlEhUbvzPHmiAyPSkDcBIhQx_Ey06KfkgaUEwELBtJsJYP71-Vsx06NTKFKWZQupZGNVE8DCo8dC0j-zpcaNCPGFiyNJKN7tonZ3gMSAM.jpg?r=397"/>
<div class="overlay d-none d-lg-block text-center">
<span class="d-block font-weight-bold small mt-3">Documentaries</span>
<span class="d-block font-weight-bold small">International Movies</span>
</div>
</a>
</div>
<p><strong><a href="/title/80244680/">A Tale of Two Kitchens</a></strong><br/>2019</p>
</div>
A Tale of Two Kitchens
<br/>
答案 0 :(得分:0)
my_element.contents[-1]
这将为您提供my_element
中包含的最后一个元素:在这种情况下,如果my_element
是<p>
,则文本“ 2019”将作为{{1} }。 ( first 子级是NavigableString
标记,其中包含<strong>
和所有其余标记。)
答案 1 :(得分:0)
使用以下代码。找到Item
标记,然后使用<a>
next_element
输出:
两个厨房的故事2019
from bs4 import BeautifulSoup
html='''<div class="col-sm-6 col-lg-3">
<div class="poster-container">
<a class="poster-link" href="/title/80244680/">
<img alt="A Tale of Two Kitchens (2019)" class="poster" src="https://occ-0-37-33.1.nflxso.net/dnm/api/v6/0DW6CdE4gYtYx8iy3aj8gs9WtXE/AAAABfTGUtIG2HYlEhUbvzPHmiAyPSkDcBIhQx_Ey06KfkgaUEwELBtJsJYP71-Vsx06NTKFKWZQupZGNVE8DCo8dC0j-zpcaNCPGFiyNJKN7tonZ3gMSAM.jpg?r=397"/>
<div class="overlay d-none d-lg-block text-center">
<span class="d-block font-weight-bold small mt-3">Documentaries</span>
<span class="d-block font-weight-bold small">International Movies</span>
</div>
</a>
</div>
<p><strong><a href="/title/80244680/">A Tale of Two Kitchens</a></strong><br/>2019</p>
</div>
A Tale of Two Kitchens
<br/>'''
soup=BeautifulSoup(html,'html.parser')
item=soup.select_one('.col-sm-6.col-lg-3').find_next('p')
print(item.text)
输出:
两个厨房的故事
item=soup.select_one('.col-sm-6.col-lg-3').find_next('p').find('a').text
print(item)
输出:
2019