我正在尝试匹配字符串中的所有href
,但是在href包含特定文本(例如login
)时排除(我相信使用负前瞻),例如:
const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`
const match = str.match(/href="(.*?)"/g)
console.log(match)
这与所有href
匹配,但不排除在其中找到login
的可能性。我尝试了几种不同的变体,但实际上还没到任何地方。任何帮助将不胜感激!
答案 0 :(得分:1)
您可以使用此正则表达式,使正则表达式后面的符号看起来不正确
href="(.*?)(?<!login)"
演示
https://regex101.com/r/15DwZE/1
编辑1: 正如第四只鸟指出的那样,上述regex可能无法正常运行,而不是提出一个复杂的regex来覆盖被拒绝的url登录外观的所有可能性,这是一个javascript解决方案。
var myString = 'This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>';
var myRegexp = /href="(.*?)"/g;
match = myRegexp.exec(myString);
while (match != null) {
if (match[1].indexOf('login') == -1) {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}
答案 1 :(得分:1)
您可以使用DOMParser,而无需使用正则表达式,并使用例如includes来检查href是否包含您的字符串。
let parser = new DOMParser();
let html = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
let doc = parser.parseFromString(html, "text/html");
let anchors = doc.querySelectorAll("a");
anchors.forEach(a => {
if (!a.href.includes("login")) {
console.log(a.href);
}
});
答案 2 :(得分:0)
您可以拥有一个临时HTML节点,并从中获取所有<a>
标签。然后按href过滤。示例代码:
const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
const d = document.createElement('div');
d.innerHTML = str;
Array.from(d.getElementsByTagName("a")).filter(a => !/login/.test(a.href))
答案 3 :(得分:0)
您可以使用此正则表达式来实现
/<[\w:]+(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(?:(['"])(?:(?!\1|login)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>/
https://regex101.com/r/LEQL7h/1
更多信息
< [\w:]+ # Any tag
(?= \s )
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s href \s* = \s* # href attribute
(?:
( ['"] ) # (1), Quote
(?:
(?! \1 | login ) # href cnnot contain login
[\S\s]
)*
\1
)
)
# Have href that does not contain login, match the rest of tag
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>