从div类中选择第一个span标签

时间:2018-07-26 23:00:33

标签: html python-3.x beautifulsoup

  <div class="ticket_last_24 report_table_right">
                            <span>13,978</span>
                            <span>(</span><span class="change_increase">+2.3% 
                            </span><span>)</span>                       
</div>

                        <div class="ticket_last_week report_table_right">
                            <span>99,585</span>
                            <span>(</span><span class="change_increase">+0.6% 
                        </span><span>)</span>                       
</div>
  <div class="ticket_last_24 report_table_right">
                            <span>12121</span>
                            <span>(</span><span class="change_increase">+2.3% 
                            </span><span>)</span>                       
</div>

                        <div class="ticket_last_week report_table_right">
                            <span>99,222</span>
                            <span>(</span><span class="change_increase">+0.6% 
                        </span><span>)</span>                       

</div>

我尝试了以下代码:

text=[]
from bs4 import BeautifulSoup
TicketNuber=soup.find_all("div")
for div in TicketNuber:
        text.append(div.find("span"))
it prints out:[
 '13,978',
 '13,978',
 '99,585',
 '12,121'
 '12,121'
 '99,222'
 ]

不确定第一个数字为什么会打印两次。我只想要号码 ['13,978','99492','12,121','99,222']。同一标签中没有重复的数字

1 个答案:

答案 0 :(得分:0)

这可以完成工作:

from bs4 import BeautifulSoup

document = '''
<div class="ticket_last_24 report_table_right">
  <span>13,978</span>
  <span>(</span><span class="change_increase">+2.3% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_week report_table_right">
  <span>99,585</span>
  <span>(</span><span class="change_increase">+0.6% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_24 report_table_right">
  <span>12121</span>
  <span>(</span><span class="change_increase">+2.3% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_week report_table_right">
  <span>99,222</span>
  <span>(</span><span class="change_increase">+0.6% 
  </span><span>)</span>
</div>
'''

soup = BeautifulSoup(document, "lxml")

for div in soup.find_all("div"):
    print(div.find("span").text)

输出:

13,978
99,585
12121
99,222

很明显,HTML文档和我的文档之间存在一些差异,这些差异必须归结为您所截取的与实际文档不匹配的代码段,您可以使用print(soup)打印和发布该代码段。您还只发布了部分代码(而不是mcve,所以我需要查看整个故事以进一步提供帮助。