So I have this regex that I designed, but can't seem to exclude links on a page that already have target="_blank" or links that contain <a name="...">
or <a hre="#...">
How would I exclude links with target="_blank" and not add target="_blank" to anchor links?
Find: <a href=(".*)|([^#][^"]*)\\s>(\w.*)(</a>)
Replace: <a href=$1 target="_blank"$2$3
答案 0 :(得分:0)
正则表达式是notoriously这项工作的错误工具。
HTML是正则表达式无法理解的结构化数据,这意味着您碰到的恰恰是您遇到的问题:对于任何非平凡的问题,HTML结构中允许的许多变体都使得使用解析非常困难。字符串操作技术。
DOM方法旨在处理此类数据,因此请改用它们。以下内容将遍历文档中的每个<a>
标记,排除那些没有href属性,href以'#'开头或name属性的对象,并在其余标签上设置'target'属性。
Array.from(document.getElementsByTagName('a')).forEach(function(a) {
if (
a.getAttribute("href") &&
a.getAttribute("href").indexOf('#') !==0 &&
a.getAttribute("name") === null
) {
a.setAttribute('target', '_blank'); // on links that already have this attribute this will do nothing
}
});
// Just to confirm:
console.log(document.getElementById('container').innerHTML)
<div id="container">
<a href="http://example.com">test</a>
<a href="#foo">test2</a>
<a href="http://example.com" target="_blank">test3</a>
<a name="foo">test4</a>
</div>