删除重复的跨度标签内容

时间:2018-07-27 19:37:23

标签: python-3.x select beautifulsoup

  <div class="ticket_last_24 report_table_right">
                            <span>13,978</span>
                            <span>(</span><span 
                            class="change_increase">+2.3% 
                            </span><span>)</span>                       
  </div>

 <div class="ticket_last_week report_table_right">

                            <span>99,585</span>
                            <span>(</span><span 
                            class="change_increase">+0.6% 
                            </span><span>)</span>                       
</div>

  <div class="ticket_last_24 report_table_right">

                            <span>12121</span>
                            <span>(</span><span 
                            class="change_increase">+2.3% 
                            </span><span>)</span>                       
 </div>

    <div class="ticket_last_week report_table_right"> 
                            <span>99,222</span>
                            <span>(</span><span 
                           class="change_increase">+0.6% 
                        </span><span>)</span>                       

    </div>

我尝试了以下代码:

    text=[]
    from bs4 import BeautifulSoup
    TicketNuber=soup.find_all("div")
    for div in TicketNuber:
            text.append(div.find("span"))
    it prints out:[
     '13,978',
     '13,978',
     '99,585',
     '12,121'
     '12,121'
     '99,222'
     ]

不确定第一个数字为什么会打印两次。我只想要数字['13,978','99492','12,121','99,222']。同一标签中没有重复的数字

1 个答案:

答案 0 :(得分:1)

当我这样做时:

text = []

TicketNumber = soup.find_all("div")
for div in TicketNumber:
    text.append(div.find("span").get_text())

print(text)

我明白了:

['13,978', '99,585', '12,121', '99,222']

您能试一试并确认是否可行吗?