这是一个示例html字符串:
PS:注意字符串如何在图像中具有任何随机属性,某些图像可以用" />"关闭,有些图像用">"。这不重要。正则表达式应该过滤所有噪声并捕获数组中的所有图像src。
stackoverflow中给出的答案不会考虑图像标记内的空格和它们之间的属性
<div>
<div>
<div>
<img title= "SOME TITLE" src="SOME IMAGE" alt="SOME ALT" />
<img alt="SOME ALT" title="SOME TITLE" src= "SOME IMAGE" >
</div>
<img src="SOME IMAGE">
</div>
<div>
<img alt ="SOME ALT" src= "SOME IMAGE" title="SOME TITLE">
</div>
<img src="SOME IMAGE" alt="SOME ALT" title="SOME TITLE" />
< img src ="SOME IMAGE" alt="SOME ALT" title="SOME TITLE" />
</div>
我正在寻找这样的代码:
var pictures = [],
m,
rx = /SOME REGEX/g;
while (m = rx.exec(str)) { //str being the html string of any sort
pictures.push(m[SOME INDEX]); //m[SOME INDEX] to match the value of src attribute
}
答案 0 :(得分:0)
我想我有一个模式给你。包括http / https / ftp / ftps或只是//.
&#xA;&#xA; (http | ftp | \ / {2})?s?:?\ / { 2}(* [^ \ S] +。)\(JP例如| PNG | GIF?)\ S&#XA; 代码>
&#XA;
答案 1 :(得分:0)
这可能就是你所需要的。但我不明白为什么你必须使用regex
。这只是一个例子。您需要验证更多以便改进它。基本思路是您需要向容器div
添加一个类。您也可以使用body
标记。但我建议你更细致。选择包含所有img
标记的元素。然后捕获它们的内部HTML并将正则表达式应用于该字符串。我建议你用户selectQueryAll它更简单。
var pictures = [],
m;
var str = document.getElementById('container').innerHTML,
rex = /<img[^>]+src="?([^"\s]+)"?\s*/gi;
while (m = rex.exec( str )) {
pictures.push( m[1] );
}
var output = document.getElementById('output');
var index = 0;
pictures.forEach(function(picture){
var pTag = document.createElement('p');
pTag.innerHTML = '[' + index++ + '] ' + 'img tag found. URL extacted -> ' + picture;
output.appendChild(pTag);
})
<div id="container">
<div>
<div>
<img title="SOME TITLE" src="http://i.imgur.com/1B0mUM2.jpg" alt="SOME ALT" />
<img alt="SOME ALT" title="SOME TITLE" src="http://i.imgur.com/UWWQ0Wr.jpg">
</div>
<img src="http://i.imgur.com/UWWQ0Wr.jpg">
</div>
<div>
<img alt="SOME ALT" src="http://i.imgur.com/UWWQ0Wr.jpg" title="SOME TITLE">
</div>
<img src="http://i.imgur.com/1B0mUM2.jpg" alt="SOME ALT" title="SOME TITLE" />
<img src="http://i.imgur.com/UWWQ0Wr.jpg" alt="SOME ALT" title="SOME TITLE" />
</div>
<div id="output"></div>
答案 2 :(得分:0)
我这样做:
var
uri = response.request.uri, //Coming from node
pictures = [],
r = /src="?([^"\s]+)(jp?g|png|gif)"/g,
m;
while (m = r.exec(html)) {
if (!m[1].startsWith('data:')) {
if (!m[1].startsWith('http')) {
m[1] = uri.protocol + '//' + uri.host + '/' + m[1]
}
pictures.push(src: m[1] + m[2]);
}
}
&#13;
答案 3 :(得分:0)
尝试以下操作:
/**
*
* 1. src :- match will start by src
* 2. (\s*) :- might be followed by 0 or more spaces
* 3. = :- then we definitely have =
* 4. (\s*) :- might be followed by 0 or more spaces
* 5. " :- then we will have "
* 6. ([^\s]*) :- might be followed by 0 or more characters except space
* 7. " :- finally we would have closing "
*/
var re = /src(\s*)=(\s*)"([^\s]*)"/g;
var str = "src=\"http://bsfsd1.png\" xyz a src= \"http://bsfsd2.xyz\" axy src= \"http://bsfsd3.png\" abc src = \"http://bsfsd4.png\" sandeep ";
var xArray;
var pictures = [];
while(xArray = re.exec(str)){
pictures.push(xArray[3]);
}
console.log(pictures);