This是我用于Python的正则表达式:
^(?<!(<!--.))(http(s?):)?([\/|\.|\w|\s|-])*\.(?:jpg|gif|png)$
当前表达式与此匹配:
/images/lol/hallo.png
但是我需要它来匹配此图像网址:
/images/lol/hallo.png
,此图片的网址没有周围的标签:
<img src="/images/lol/hallo.png" />
但没有被注释掉的这些
<!-- /images/lol/hallo.png -->
<!-- <img src="/images/lol/hallo.png" /> -->
答案 0 :(得分:0)
这应该有效:
<!--[\s\S]*?-->|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))
测试字符串:
<img src="/images/lol/hallo.png" />
/images/lol/hallo.png
/images/lol/hallo.png
//example.com/images/lol/hallo.png
http://example.com/images/lol/hallo.png
https://example.com/images/lol/hallo.png
<!-- /images/lol/commented.png -->
<!-- <img src="/images/lol/commented2.png" /> -->
images/ui/paper-icon-1.png
/images/lol/hallo.png and more here /images/lol/hallo.png
Python代码:
import re
x = '''
<img src="/images/lol/hallo.png" />
/images/lol/hallo.png
/images/lol/hallo.png
//example.com/images/lol/hallo.png
http://example.com/images/lol/hallo.png
https://example.com/images/lol/hallo.png
<!-- /images/lol/commented.png -->
<!-- <img src="/images/lol/commented2.png" /> -->
images/ui/paper-icon-1.png
/images/lol/hallo.png and more here /images/lol/hallo.png
'''
regexp = r'<!--[\s\S]*?-->|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))'
result = [item[0] for item in re.findall(regexp, x) if item[0]]
for item in result:
print(item)