正则表达式在页面上获取图像,包括src-set和data-src varients

时间:2017-02-22 10:48:59

标签: php regex

我想获取HTML页面中包含的所有图像的完整路径/网址。 包括src-set和人们可能使用的所有类型的data-src变体。

匹配页面上的任何内容是../image.jpg或http://domain.ca/some/path/image.jpg正是我要找的。

尝试将此正则表达式用于preg_match_all

/(https?:\/\/|\/|\/|^((?:\.\.\/)+))[^\/\s]+\/\S+\.(jpg|png|gif)/

https://regex101.com/r/69F1zL/3

以下是我可能遇到的图片类型

../yep.jpg
yep.jpg
im/some.jpg
/some.jpg
src="../uploads/2016/02/logo-home.png" 
im/sfds/some.jpg
url(thedir/img.jpg)
../../yep.jpg

src="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png" 
data-huge="some/big.jpg" 
src="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png"
srcset="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png 793w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-300x201.png 300w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-768x514.png 768w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-700x469.png 700w"
sizes="(max-width: 793px) 100vw, 793px" /></div>

https://regex101.com/r/69F1zL/3 https://regex101.com/r/69F1zL/3

0 个答案:

没有答案