Question

我正在尝试匹配字符串中的所有href，但是在href包含特定文本（例如login）时排除（我相信使用负前瞻），例如：

const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`

const match = str.match(/href="(.*?)"/g)

console.log(match)

这与所有href匹配，但不排除在其中找到login的可能性。我尝试了几种不同的变体，但实际上还没到任何地方。任何帮助将不胜感激！

Answer 1

您可以使用此正则表达式，使正则表达式后面的符号看起来不正确

href="(.*?)(?<!login)"

演示

https://regex101.com/r/15DwZE/1

编辑1：正如第四只鸟指出的那样，上述regex可能无法正常运行，而不是提出一个复杂的regex来覆盖被拒绝的url登录外观的所有可能性，这是一个javascript解决方案。

var myString = 'This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>';
var myRegexp = /href="(.*?)"/g;
match = myRegexp.exec(myString);
while (match != null) {
    if (match[1].indexOf('login') == -1) {
        console.log(match[1]);
    }
  match = myRegexp.exec(myString);
}

Answer 2

您可以使用DOMParser，而无需使用正则表达式，并使用例如includes来检查href是否包含您的字符串。

let parser = new DOMParser();
let html = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
let doc = parser.parseFromString(html, "text/html");
let anchors = doc.querySelectorAll("a");
anchors.forEach(a => {
  if (!a.href.includes("login")) {
    console.log(a.href);
  }
});

Answer 3

您可以拥有一个临时HTML节点，并从中获取所有<a>标签。然后按href过滤。示例代码：

const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
const d = document.createElement('div');
d.innerHTML = str;
Array.from(d.getElementsByTagName("a")).filter(a => !/login/.test(a.href))

Answer 4

您可以使用此正则表达式来实现

/<[\w:]+(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(?:(['"])(?:(?!\1|login)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>/

https://regex101.com/r/LEQL7h/1

更多信息

 < [\w:]+               # Any tag
 (?= \s )
 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s href \s* = \s*      # href attribute
      (?:
           ( ['"] )               # (1), Quote
           (?:
                (?! \1 | login )       # href cnnot contain login
                [\S\s] 
           )*
           \1 
      )
 )
                        # Have href that does not contain login, match the rest of tag
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+

 >

正则表达式匹配字符串中的所有href，除非包含单词

4 个答案: