给定以下字符串,我可以使用什么正则表达式来提取URL(我不需要引号)?
urllib.quote
答案 0 :(得分:0)
您要找的是/(\/.*?\.\w{3})/g
:
var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';
console.log(string.match(/(\/.*?\.\w{3})/g));
打破这个局面:
\/
匹配正斜杠,用反斜杠转义.*
匹配0
或更多不是换行符的字符\.
匹配一个点,用反斜杠转义\w{3}
正好匹配三个'字'字符(字母数字或下划线)g
标记表示正则表达式应匹配所有出现次数 .match
返回一个数组,您可以通过简单地指定索引或循环遍历来提取单个字符串(不带引号):
var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';
var matches = string.match(/(\/.*?\.\w{3})/g);
for (var i = 0; i < matches.length; i++) {
console.log(matches[i]);
}
希望这有帮助! :)
答案 1 :(得分:0)
使用HTML创建DocumentFragment更安全,然后查询临时DOM以获取信息。这更安全,因为正则表达式可能非常脆弱。例如,如果HTML中的URL可能有也可能没有https,ftp等协议,会发生什么。
我正在使用一个小型库将HTML转换为DocumentFragemnt。但是你可以通过很多方式做到这一点。
let html = `<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>`;
let fragment = HtmlFragment(html);
let urls = Array
.from(fragment.querySelectorAll('img[src]'))
.map(img => img.getAttribute('src'));
console.log(urls);
<script src="https://unpkg.com/html-fragment@1.1.0/lib/html-fragment.min.js"></script>
答案 2 :(得分:0)
var string = '<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>';
console.log(string.match(/(\/.*?\.\w{3})/g));