<div class="ticket_last_24 report_table_right">
<span>13,978</span>
<span>(</span><span class="change_increase">+2.3%
</span><span>)</span>
</div>
<div class="ticket_last_week report_table_right">
<span>99,585</span>
<span>(</span><span class="change_increase">+0.6%
</span><span>)</span>
</div>
<div class="ticket_last_24 report_table_right">
<span>12121</span>
<span>(</span><span class="change_increase">+2.3%
</span><span>)</span>
</div>
<div class="ticket_last_week report_table_right">
<span>99,222</span>
<span>(</span><span class="change_increase">+0.6%
</span><span>)</span>
</div>
我尝试了以下代码:
text=[]
from bs4 import BeautifulSoup
TicketNuber=soup.find_all("div")
for div in TicketNuber:
text.append(div.find("span"))
it prints out:[
'13,978',
'13,978',
'99,585',
'12,121'
'12,121'
'99,222'
]
不确定第一个数字为什么会打印两次。我只想要号码 ['13,978','99492','12,121','99,222']。同一标签中没有重复的数字
答案 0 :(得分:0)
这可以完成工作:
from bs4 import BeautifulSoup
document = '''
<div class="ticket_last_24 report_table_right">
<span>13,978</span>
<span>(</span><span class="change_increase">+2.3%
</span><span>)</span>
</div>
<div class="ticket_last_week report_table_right">
<span>99,585</span>
<span>(</span><span class="change_increase">+0.6%
</span><span>)</span>
</div>
<div class="ticket_last_24 report_table_right">
<span>12121</span>
<span>(</span><span class="change_increase">+2.3%
</span><span>)</span>
</div>
<div class="ticket_last_week report_table_right">
<span>99,222</span>
<span>(</span><span class="change_increase">+0.6%
</span><span>)</span>
</div>
'''
soup = BeautifulSoup(document, "lxml")
for div in soup.find_all("div"):
print(div.find("span").text)
输出:
13,978
99,585
12121
99,222
很明显,HTML文档和我的文档之间存在一些差异,这些差异必须归结为您所截取的与实际文档不匹配的代码段,您可以使用print(soup)
打印和发布该代码段。您还只发布了部分代码(而不是mcve,所以我需要查看整个故事以进一步提供帮助。