python beautifulsoup选择,如何获取具体值

时间:2018-05-07 14:49:59

标签: python beautifulsoup

我只想得到'281'。 这是最后一页的价值。

代码:

<div class="page2"><a href="#" onclick="goPage('1'); return false;">
<img alt="First-Page" border="0" src="/images/ic_arrow_first.gif"/></a>
<a href="#" onclick="goPage('1'); return false;"><span> 1 </span></a>
<a href="#" onclick="goPage('41'); return false;"><span> 2 </span></a>
<a href="#" onclick="goPage('81'); return false;"><span> 3 </span></a>
<a href="#" onclick="goPage('121'); return false;"><span> 4 </span></a>
<span class="page_on">5</span>
<a href="#" onclick="goPage('201'); return false;"><span> 6 </span></a>
<a href="#" onclick="goPage('241'); return false;"><span> 7 </span></a>
<a href="#" onclick="goPage('281'); return false;"><span> 8 </span></a>
<a href="#" onclick="goPage('281'); return false;">
<img alt="Last Page" border="0" src="/images/ic_arrow_last.gif"/></a></div>

1 个答案:

答案 0 :(得分:1)

import re
s = """<div class="page2"><a href="#" onclick="goPage('1'); return false;"><img alt="First-Page" border="0" src="/images/ic_arrow_first.gif"/></a><a href="#" onclick="goPage('1'); return false;"><span> 1 </span></a><a href="#" onclick="goPage('41'); return false;"><span> 2 </span></a><a href="#" onclick="goPage('81'); return false;"><span> 3 </span></a><a href="#" onclick="goPage('121'); return false;"><span> 4 </span></a><span class="page_on">5</span><a href="#" onclick="goPage('201'); return false;"><span> 6 </span></a><a href="#" onclick="goPage('241'); return false;"><span> 7 </span></a><a href="#" onclick="goPage('281'); return false;"><span> 8 </span></a><a href="#" onclick="goPage('281'); return false;"><img alt="Last Page" border="0" src="/images/ic_arrow_last.gif"/></a></div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, "html.parser")
val = soup.find_all("a")[-1]["onclick"]    #Get the last element using negative indexing.
m = re.search("\((.*?)\)", val)   #Regex to get content inside "()"
if m:
    print(m.group())    #Or print(m.group(1)) --> '281'

<强>输出:

('281')