正则表达式可匹配除已注释掉的图像URL外的所有图像URL

时间:2019-03-04 13:45:14

标签: python regex

This是我用于Python的正则表达式:

^(?<!(<!--.))(http(s?):)?([\/|\.|\w|\s|-])*\.(?:jpg|gif|png)$

当前表达式与此匹配:

/images/lol/hallo.png

但是我需要它来匹配此图像网址:

/images/lol/hallo.png

,此图片的网址没有周围的标签:

<img src="/images/lol/hallo.png" />

但没有被注释掉的这些

<!-- /images/lol/hallo.png -->
<!-- <img src="/images/lol/hallo.png" /> -->

1 个答案:

答案 0 :(得分:0)

这应该有效:

<!--[\s\S]*?-->|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))

测试字符串:

<img src="/images/lol/hallo.png" />
    /images/lol/hallo.png
    /images/lol/hallo.png
    //example.com/images/lol/hallo.png
    http://example.com/images/lol/hallo.png
    https://example.com/images/lol/hallo.png
    <!-- /images/lol/commented.png -->
    <!-- <img src="/images/lol/commented2.png" /> -->
    images/ui/paper-icon-1.png


/images/lol/hallo.png and more here /images/lol/hallo.png

Python代码:

import re

x = '''
    <img src="/images/lol/hallo.png" />
    /images/lol/hallo.png
    /images/lol/hallo.png
    //example.com/images/lol/hallo.png
    http://example.com/images/lol/hallo.png
    https://example.com/images/lol/hallo.png
    <!-- /images/lol/commented.png -->
    <!-- <img src="/images/lol/commented2.png" /> -->
    images/ui/paper-icon-1.png


/images/lol/hallo.png and more here /images/lol/hallo.png
'''
regexp = r'<!--[\s\S]*?-->|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))'
result = [item[0] for item in re.findall(regexp, x) if item[0]]
for item in result:
    print(item)

演示:https://regex101.com/r/YmXo2Q/4