How could I go about replacing a string
:
Hello my name is <a href='/max'>max</a>!
<script>alert("DANGEROUS SCRIPT INJECTION");</script>
with
Hello my name is <a href='/max'>max</a>!
<script>alert("DANGEROUS SCRIPT INJECTION");</script>
I can easily have all the <
,>
replaced with <
,>
with:
string = string.replace(/</g, "<").replace(/>/g, ">");
but I still want to be able to have <a>
links.
I have also looked into preventing script injection with:
var html = $(string.bold());
html.find('script').remove();
But I want to be able to still read the script tags rather than them being removed.
答案 0 :(得分:0)
解决此问题的一种方法是使用具有严格的后视模式的正则表达式,该模式仅允许非常接近某种格式的锚点。
我们假设您只想允许完全遵循此示例的链接:
<a href="http://host.domain/path?query#anchor">text</a>
和
<a href="https://host.domain/path?query#anchor">text</a>
构建一个仅匹配&#34;&lt;&#34;这个有效模式没有后跟的字符(负面的后观):
<(?!a href="https?:\/\/\w[\w.-\/\?#]+">\w+<\/a>)
此正则表达式的一个问题是,如果您将其与整个字符串匹配,<
仍将与结束a
元素(</a>
)匹配,因此如果您更换每个与<
匹配,你将打破锚点。
您可以通过附加否定后备替代方案来允许所有结束</a>
代码:
<(?!a href="https?:\/\/\w[\w.-\/\?#]+">\w+<\/a>|\/a>)
也许其他人对该子问题有更好的解决方案。
这是最后的string.replace:
string.replace(/<(?!a href="https?:\/\/\w[\w.-\/\?#]+">\w+<\/a>|\/a>)/g, '<');
注意:所有这些输入检查必须始终在服务器端完成,在客户端,检查可以简单地被规避,并且您将恶意数据发送到您的服务器,尽管检查。
答案 1 :(得分:0)
此代码段应该可以解决问题。您可以添加其他标记名称,以便在数组allowedTagNames
中作为HTML标记传递。
// input
var html = "Hello my name is <a href='/max'>max</a>! <script>alert('DANGEROUS SCRIPT INJECTION');</script>";
var allowedTagNames = ["a"];
// output
var processedHTML = "";
var processingStart = 0;
// this block finds the next tag and processes it
while (true) {
var tagStart = html.indexOf("<", processingStart);
if (tagStart === -1) { break; }
var tagEnd = html.indexOf(">", tagStart);
if (tagEnd === -1) { break; }
var tagNameStart = tagStart + 1;
if (html[tagNameStart] === "/") {
// for closing tags
++tagNameStart;
}
// we expect there to be either a whitespace or a > after the tagName
var tagNameEnd = html.indexOf(" ", tagNameStart);
if (tagNameEnd === -1 || tagNameEnd > tagEnd) {
tagNameEnd = tagEnd;
}
var tagName = html.slice(tagNameStart, tagNameEnd);
// copy in text which is between this tag and the end of last tag
processedHTML += html.slice(processingStart, tagStart);
if (allowedTagNames.indexOf(tagName) === -1) {
processedHTML += "<" + html.slice(tagStart + 1, tagEnd) + ">";
} else {
processedHTML += html.slice(tagStart, tagEnd + 1);
}
processingStart = tagEnd + 1;
}
// copy the rest of input which wasn't processed
processedHTML += html.slice(processingStart);
注意:如果标记属性中有<
或>
,则无效。
例如:<a href=">">
答案 2 :(得分:0)
您可以在 Regex 中使用捕获组和环视来实现此目的
string = string.replace(/<((?!a )[^>]*)>/g, "<$1>").replace(/<\/a>/g, "</a>");
第一部分替换从 <tag>
到 <tag>
的所有 HTML 标记(除了锚开始标记 ),第二部分替换从</a>
回到 </a>
答案 3 :(得分:0)
如果您只想替换 <script...
标签,下面的代码将起作用(您可以在浏览器控制台中运行它)并且所有其他标签都不会更改。在我的示例中,我添加了额外的一行,只是为了演示它如何处理多个 <script...
标签。
let s = "Hello my name is <a href='/max'>max</a>!<script>alert(\"DANGEROUS SCRIPT INJECTION\");</script>";
s += "Hello my name is <a href='/bob'>bob</a>!<script>alert(\"DANGEROUS SCRIPT INJECTION\");</script>";
s.match(/<script.*?<\/script>/g).forEach(scr => s = s.replace(scr, scr.replace(/</g, "<").replace(/>/g, ">")));
console.log(s);
// OUTPUT: Hello my name is <a href='/max'>max</a>!<script>alert("DANGEROUS SCRIPT INJECTION");</script>Hello my name is <a href='/bob'>bob</a>!<script>alert("DANGEROUS SCRIPT INJECTION");</script>