我需要使用正则表达式从html字符串中提取所有图像src

时间:2017-05-04 02:18:45

标签: javascript regex

这是一个示例html字符串:

PS:注意字符串如何在图像中具有任何随机属性,某些图像可以用" />"关闭,有些图像用">"。这不重要。正则表达式应该过滤所有噪声并捕获数组中的所有图像src。

stackoverflow中给出的答案不会考虑图像标记内的空格和它们之间的属性

<div>
  <div>
    <div>
      <img   title=  "SOME TITLE" src="SOME IMAGE" alt="SOME ALT" />
      <img   alt="SOME ALT" title="SOME TITLE" src=   "SOME IMAGE"     >
    </div>
    <img src="SOME IMAGE">
  </div>

  <div>
    <img alt   ="SOME ALT" src=  "SOME IMAGE" title="SOME TITLE">
  </div>

  <img   src="SOME IMAGE" alt="SOME ALT" title="SOME TITLE" />
  < img src  ="SOME IMAGE" alt="SOME ALT" title="SOME TITLE"    />
</div>
 

我正在寻找这样的代码:

var pictures = [],
  m,
  rx = /SOME REGEX/g;

while (m = rx.exec(str)) { //str being the html string of any sort
  pictures.push(m[SOME INDEX]); //m[SOME INDEX] to match the value of src attribute
}

4 个答案:

答案 0 :(得分:0)

我想我有一个模式给你。包括http / https / ftp / ftps或只是//.

&#xA;&#xA;
 (http | ftp | \ / {2})?s?:?\ / { 2}(* [^ \ S] +。)\(JP例如| PNG | GIF?)\ S&#XA;  
&#XA;

答案 1 :(得分:0)

这可能就是你所需要的。但我不明白为什么你必须使用regex。这只是一个例子。您需要验证更多以便改进它。基本思路是您需要向容器div添加一个类。您也可以使用body标记。但我建议你更细致。选择包含所有img标记的元素。然后捕获它们的内部HTML并将正则表达式应用于该字符串。我建议你用户selectQueryAll它更简单。

var pictures = [],
  m;
var str = document.getElementById('container').innerHTML,
    rex =  /<img[^>]+src="?([^"\s]+)"?\s*/gi;

while (m = rex.exec( str )) {
    pictures.push( m[1] );
}


var output = document.getElementById('output');
var index = 0;
pictures.forEach(function(picture){
  var pTag = document.createElement('p');
  pTag.innerHTML = '[' + index++ + '] ' + 'img tag found. URL extacted -> ' + picture;
  output.appendChild(pTag);
})
<div id="container">
  <div>
    <div>
      <img title="SOME TITLE" src="http://i.imgur.com/1B0mUM2.jpg" alt="SOME ALT" />
      <img alt="SOME ALT" title="SOME TITLE" src="http://i.imgur.com/UWWQ0Wr.jpg">
    </div>
    <img src="http://i.imgur.com/UWWQ0Wr.jpg">
  </div>

  <div>
    <img alt="SOME ALT" src="http://i.imgur.com/UWWQ0Wr.jpg" title="SOME TITLE">
  </div>

  <img src="http://i.imgur.com/1B0mUM2.jpg" alt="SOME ALT" title="SOME TITLE" />
  <img src="http://i.imgur.com/UWWQ0Wr.jpg" alt="SOME ALT" title="SOME TITLE" />
</div>
<div id="output"></div>

答案 2 :(得分:0)

我这样做:

&#13;
&#13;
var
  uri = response.request.uri, //Coming from node
  pictures = [],
  r = /src="?([^"\s]+)(jp?g|png|gif)"/g,
  m;

while (m = r.exec(html)) {
  if (!m[1].startsWith('data:')) {
    if (!m[1].startsWith('http')) {
      m[1] = uri.protocol + '//' + uri.host + '/' + m[1]
    }

    pictures.push(src: m[1] + m[2]);
  }
}
&#13;
&#13;
&#13;

答案 3 :(得分:0)

尝试以下操作:

  /**
   * 
   * 1. src :- match will start by src
   * 2. (\s*) :- might be followed by 0 or more spaces
   * 3. =  :- then we definitely have =
   * 4. (\s*) :- might be followed by 0 or more spaces
   * 5. " :- then we will have "
   * 6. ([^\s]*) :- might be followed by 0 or more characters except space
   * 7. " :- finally we would have closing "
   */
var re = /src(\s*)=(\s*)"([^\s]*)"/g;

var str = "src=\"http://bsfsd1.png\" xyz  a src= \"http://bsfsd2.xyz\" axy src=   \"http://bsfsd3.png\" abc src   =  \"http://bsfsd4.png\" sandeep ";

var xArray; 
var pictures = [];
while(xArray = re.exec(str)){
  pictures.push(xArray[3]);
}
console.log(pictures);