假设我有一个包含以下代码段的html字符串。
... <img class="employee thumb" src="http://localhost/services/employee1.jpg" /> ...
我想搜索此标记是否可用,如果是,请获取src网址。 <img class="employee thumb"
可用于唯一标识标记。
如何在python中执行此操作?
答案 0 :(得分:1)
使用正则表达式:
>>> import re
>>> str = '<img class="employee thumb" src="http://localhost/services/employee1.jpg" />'
>>> if re.search('img class="employee thumb"', str):
... print re.findall ( 'src="(.*?)"', s, re.DOTALL)
...
['http://localhost/services/employee1.jpg']
使用lxml:
>>> from lxml import etree
>>> root = etree.fromstring("""
... <html>
... <img class="employee thumb" src="http://localhost/services/employee1.jpg" />
... </html>
... """)
>>> print root.xpath("//img[@class='employee thumb']/@*")[1]
http://localhost/services/employee1.jpg