如何使用以下字符串中的正则表达式来获取url

时间:2017-08-17 02:58:32

标签: javascript regex

给定以下字符串,我可以使用什么正则表达式来提取URL(我不需要引号)?

urllib.quote

3 个答案:

答案 0 :(得分:0)

您要找的是/(\/.*?\.\w{3})/g

var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';

console.log(string.match(/(\/.*?\.\w{3})/g));

打破这个局面:

  • \/匹配正斜杠,用反斜杠转义
  • .*匹配0或更多不是换行符的字符
  • \.匹配一个点,用反斜杠转义
  • \w{3}正好匹配三个'字'字符(字母数字或下划线)
  • g标记表示正则表达式应匹配所有出现次数

.match 返回一个数组,您可以通过简单地指定索引或循环遍历来提取单个字符串(不带引号):

var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';

var matches = string.match(/(\/.*?\.\w{3})/g);
for (var i = 0; i < matches.length; i++) {
  console.log(matches[i]);
}

希望这有帮助! :)

答案 1 :(得分:0)

使用HTML创建DocumentFragment更安全,然后查询临时DOM以获取信息。这更安全,因为正则表达式可能非常脆弱。例如,如果HTML中的URL可能有也可能没有https,ftp等协议,会发生什么。

我正在使用一个小型库将HTML转换为DocumentFragemnt。但是你可以通过很多方式做到这一点。

let html = `<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>`;

let fragment = HtmlFragment(html);
let urls = Array
  .from(fragment.querySelectorAll('img[src]'))
  .map(img => img.getAttribute('src'));

console.log(urls);
<script src="https://unpkg.com/html-fragment@1.1.0/lib/html-fragment.min.js"></script>

答案 2 :(得分:0)

var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';

console.log(string.match(/(\/.*?\.\w{3})/g));