在网页中我有以下元素:
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e0500d612172" class="pagelink" >Page 1</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e05676787895" class="pagelink" >Page 2</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c85786787666456fgg3" class="pagelink" >Page 3</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850734234324756767" class="pagelink" >Page 4</a>
...
我需要检索“pagelink”类的所有A标签的window.open函数中的文本:
/link.php?webpage=45980a6f91ac0c850745e0500d612172
/link.php?webpage=45980a6f91ac0c850745e05676787895
/link.php?webpage=45980a6f91ac0c85786787666456fgg3
/link.php?webpage=45980a6f91ac0c850734234324756767
如何使用python执行此操作?
答案 0 :(得分:1)
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
attr = dict(attrs)
if attrs["class"] == "pagelink":
add_to_result(attrs["onclick"])
将add_to_result
替换为您的聚合对象(例如列表)和实际代码,然后从结果中删除前导window.open
。
答案 1 :(得分:0)