我想获取HTML页面中包含的所有图像的完整路径/网址。 包括src-set和人们可能使用的所有类型的data-src变体。
匹配页面上的任何内容是../image.jpg或http://domain.ca/some/path/image.jpg正是我要找的。 p>
尝试将此正则表达式用于preg_match_all
/(https?:\/\/|\/|\/|^((?:\.\.\/)+))[^\/\s]+\/\S+\.(jpg|png|gif)/
https://regex101.com/r/69F1zL/3
以下是我可能遇到的图片类型
../yep.jpg
yep.jpg
im/some.jpg
/some.jpg
src="../uploads/2016/02/logo-home.png"
im/sfds/some.jpg
url(thedir/img.jpg)
../../yep.jpg
src="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png"
data-huge="some/big.jpg"
src="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png"
srcset="https://www.thesite.nl/wp-content/uploads/2016/02/logo-home.png 793w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-300x201.png 300w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-768x514.png 768w,
https://www.thesite.nl/wp-content/uploads/2016/02/logo-home-700x469.png 700w"
sizes="(max-width: 793px) 100vw, 793px" /></div>