我只想得到'281'。 这是最后一页的价值。
代码:
<div class="page2"><a href="#" onclick="goPage('1'); return false;">
<img alt="First-Page" border="0" src="/images/ic_arrow_first.gif"/></a>
<a href="#" onclick="goPage('1'); return false;"><span> 1 </span></a>
<a href="#" onclick="goPage('41'); return false;"><span> 2 </span></a>
<a href="#" onclick="goPage('81'); return false;"><span> 3 </span></a>
<a href="#" onclick="goPage('121'); return false;"><span> 4 </span></a>
<span class="page_on">5</span>
<a href="#" onclick="goPage('201'); return false;"><span> 6 </span></a>
<a href="#" onclick="goPage('241'); return false;"><span> 7 </span></a>
<a href="#" onclick="goPage('281'); return false;"><span> 8 </span></a>
<a href="#" onclick="goPage('281'); return false;">
<img alt="Last Page" border="0" src="/images/ic_arrow_last.gif"/></a></div>
答案 0 :(得分:1)
import re
s = """<div class="page2"><a href="#" onclick="goPage('1'); return false;"><img alt="First-Page" border="0" src="/images/ic_arrow_first.gif"/></a><a href="#" onclick="goPage('1'); return false;"><span> 1 </span></a><a href="#" onclick="goPage('41'); return false;"><span> 2 </span></a><a href="#" onclick="goPage('81'); return false;"><span> 3 </span></a><a href="#" onclick="goPage('121'); return false;"><span> 4 </span></a><span class="page_on">5</span><a href="#" onclick="goPage('201'); return false;"><span> 6 </span></a><a href="#" onclick="goPage('241'); return false;"><span> 7 </span></a><a href="#" onclick="goPage('281'); return false;"><span> 8 </span></a><a href="#" onclick="goPage('281'); return false;"><img alt="Last Page" border="0" src="/images/ic_arrow_last.gif"/></a></div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, "html.parser")
val = soup.find_all("a")[-1]["onclick"] #Get the last element using negative indexing.
m = re.search("\((.*?)\)", val) #Regex to get content inside "()"
if m:
print(m.group()) #Or print(m.group(1)) --> '281'
<强>输出:强>
('281')