在python上用lxml解析html

时间:2011-10-15 13:35:41

标签: python html-parsing lxml

我有以下HTML代码:

<div class="test">
    "Test"
    <br>
    <script type="text/javascript"></script>
    <a href="mailto:asdf@adsf.com">asdf@adsf.com</a> 
    " "
</div>

如何使用lxml从此代码中获取电子邮件地址?

1 个答案:

答案 0 :(得分:4)

import lxml.html as LH
text='''\
<div class="test">
    "Test"
    <br>
    <script type="text/javascript"></script>
    <a href="mailto:asdf@adsf.com">asdf@adsf.com</a> 
    " "
</div>
'''

doc=LH.fromstring(text)
print(doc.xpath('//a[starts-with(@href,"mailto:")]/text()')[0])
# asdf@adsf.com