Question

我后面有一个字符串

<!--
document.write("<a rel='nofollow' href='mailto:&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;</a>");
//-->

如何在标记中获取htmlentities

&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;

Answer 1

如果要获取每个html实体：

const rgx = /\&\#\d+;/g;
const string = "<a rel='nofollow' href='mailto:&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;</a>";
    
while ((match = rgx.exec(string)) !== null) console.log(match[0]);

如果您想将所有这些都放在一起：

const rgx = /(\&\#\d+;)+/;
const string = "<a rel='nofollow' href='mailto:&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;</a>";
    
console.log(rgx.exec(string)[0]);

此RegEx的优点是可以处理包含HTMLEntities的每个字符串，而不管环境如何。

Answer 2

const html = "<a rel='nofollow' href='mailto:&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;</a>";

const match = /<a[^>]+>([^<]+)<\/a>/.exec(html);
console.log('match: ', match[1]);
console.log('is-correct: ', match[1] === '&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;');

这行得通，但是我不明白为什么您的标记中会有这么多实体编码。

Answer 3

仅使用split("")即可奏效。我认为这是一个更好的解决方案，因为它并不关心href内部的内容，因此它可以是任何字符串，并将其拆分出来。

const a = `document.write("<a rel='nofollow' href='mailto:&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#116;&#114;&#97;&#110;&#113;&#117;&#97;&#110;&#103;&#100;&#105;&#101;&#117;&#50;&#55;&#48;&#52;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;</a>");`

const array = a.split("mailto:")[1].split("</a>\");")[0]

console.log(array)

Answer 4

尝试此正则表达式：

const matches = str.match(/&#\d+;/);

如何使用Node.js在字符串中查找字符实体

4 个答案:

如果要获取每个html实体：

如果您想将所有这些都放在一起：