我有一个像这样的代码我试图在h1中获取数据。它是'The Wire'。但是我得到了h1中的所有文本。
<h1 id="aiv-content-title" class="js-hide-on-play">
The Wire
<span class="num-of-seasons">5 Seasons</span>
<span class="release-year">2002</span>
</h1>
我得到的输出是Wire5 Seasons2002
heading=elm.find('h1',id='aiv-content-title')
print heading
seasons=elm.find('span',{'class':'num-of-seasons'})
if seasons=='None':
print '1'
elif seasons!='None':
print seasons.text
release_year=elm.find('span',{'class':'release-year'})
print release_year.text
print
当我尝试这段代码时,我就是这样了
The Wire5 Seasons2002
5 Seasons
2002
我期待这样的事情
The Wire
5 Seasons
2002
答案 0 :(得分:1)
您可以执行以下操作:
h1_element = elm.find('h1',{id:'aiv-content-title'})
num_seasons = h1_element.find('span',{'class':'num-of-seasons'}).getText().strip()
release_year = h1_element.find('span',{'class':'release-year'}).getText().strip()
while h1_element.find('span'):
h1_element.find('span').extract()
# This will remove the span elements in the h1 element
print h1_element.getText().strip()
print num_seasons
print release_year
答案 1 :(得分:-1)
我已经解决了,但接缝有点棘手
这是代码,希望这将有助于一些新手进入这个领域
elm=soup.find('div', id="dv-dp-main-content")
heading=elm.find('h1',id='aiv-content-title')
heading=heading.text
seasons=elm.find('span',{'class':'num-of-seasons'})
if seasons=='None':
no_seasons='1 Season'
elif seasons!='None':
no_seasons=seasons.text
release_year=elm.find('span',{'class':'release-year'})
releaseyr=release_year.text
rmstr=heading.replace(releaseyr," ")
name=rmstr.replace(no_seasons," ")
print name
print no_seasons
print releaseyr