Python在html元素末尾查找文本

时间:2019-05-23 00:40:55

标签: python beautifulsoup find

我需要使用BeautifulSoup find()方法从下面的HTML文本中提取电影标题和年份。

以下返回电影的名称,但我无法仅返回年份

find('p')。find('a')。text

<div class="col-sm-6 col-lg-3">
<div class="poster-container">
<a class="poster-link" href="/title/80244680/">
<img alt="A Tale of Two Kitchens (2019)" class="poster" src="https://occ-0-37-33.1.nflxso.net/dnm/api/v6/0DW6CdE4gYtYx8iy3aj8gs9WtXE/AAAABfTGUtIG2HYlEhUbvzPHmiAyPSkDcBIhQx_Ey06KfkgaUEwELBtJsJYP71-Vsx06NTKFKWZQupZGNVE8DCo8dC0j-zpcaNCPGFiyNJKN7tonZ3gMSAM.jpg?r=397"/>
<div class="overlay d-none d-lg-block text-center">
<span class="d-block font-weight-bold small mt-3">Documentaries</span>
<span class="d-block font-weight-bold small">International Movies</span>
</div>
</a>
</div>
<p><strong><a href="/title/80244680/">A Tale of Two Kitchens</a></strong><br/>2019</p>
</div>
A Tale of Two Kitchens
<br/>

2 个答案:

答案 0 :(得分:0)

my_element.contents[-1]

这将为您提供my_element中包含的最后一个元素:在这种情况下,如果my_element<p>,则文本“ 2019”将作为{{1} }。 ( first 子级是NavigableString标记,其中包含<strong>和所有其余标记。)

答案 1 :(得分:0)

使用以下代码。找到Item标记,然后使用<a>

next_element

输出:

  

两个厨房的故事2019


from bs4 import BeautifulSoup
html='''<div class="col-sm-6 col-lg-3">
<div class="poster-container">
<a class="poster-link" href="/title/80244680/">
<img alt="A Tale of Two Kitchens (2019)" class="poster" src="https://occ-0-37-33.1.nflxso.net/dnm/api/v6/0DW6CdE4gYtYx8iy3aj8gs9WtXE/AAAABfTGUtIG2HYlEhUbvzPHmiAyPSkDcBIhQx_Ey06KfkgaUEwELBtJsJYP71-Vsx06NTKFKWZQupZGNVE8DCo8dC0j-zpcaNCPGFiyNJKN7tonZ3gMSAM.jpg?r=397"/>
<div class="overlay d-none d-lg-block text-center">
<span class="d-block font-weight-bold small mt-3">Documentaries</span>
<span class="d-block font-weight-bold small">International Movies</span>
</div>
</a>
</div>
<p><strong><a href="/title/80244680/">A Tale of Two Kitchens</a></strong><br/>2019</p>
</div>
A Tale of Two Kitchens
<br/>'''
soup=BeautifulSoup(html,'html.parser')

item=soup.select_one('.col-sm-6.col-lg-3').find_next('p')
print(item.text)

输出:

  

两个厨房的故事


item=soup.select_one('.col-sm-6.col-lg-3').find_next('p').find('a').text
print(item)

输出:

  

2019