用于搜索的正则表达式并获取图像的src

时间:2014-03-28 10:33:37

标签: python html regex

假设我有一个包含以下代码段的html字符串。

... <img class="employee thumb" src="http://localhost/services/employee1.jpg" /> ... 

我想搜索此标记是否可用,如果是,请获取src网址。 <img class="employee thumb"可用于唯一标识标记。

如何在python中执行此操作?

1 个答案:

答案 0 :(得分:1)

使用正则表达式:

>>> import re
>>> str =  '<img class="employee thumb" src="http://localhost/services/employee1.jpg" />'
>>> if re.search('img class="employee thumb"', str):
...     print re.findall ( 'src="(.*?)"', s, re.DOTALL)
... 
['http://localhost/services/employee1.jpg']

使用lxml:

>>> from lxml import etree
>>> root = etree.fromstring("""
... <html>
...     <img class="employee thumb" src="http://localhost/services/employee1.jpg" />
... </html>
... """)
>>> print root.xpath("//img[@class='employee thumb']/@*")[1]
http://localhost/services/employee1.jpg