Question

我有一个字符串，其中包含＆amp;如下所示。

"This R&M & Exapmle &nbsp;. It is very big & Complicated &146; example."

我想将&替换为&但是当我使用$str =~ s/&/&/ig;时会提供以下输出。

"This R&amp;M &amp; Company &amp;nbsp;. It is very big &amp; CMM Level3 &amp;146; Organization."

我期待着这个。

"This R&amp;M &amp; Company &nbsp;. It is very big &amp; CMM Level3 &146; Organization."

请帮助我，我不知道如何解决它。

Answer 1

您可以使用否定look-ahead assertion：

$str =~ s/&(?!\w+;)/&amp;/g;

Answer 2

use HTML::Entities;
encode_entities decode_entities "This R&M & Exapmle &nbsp;. It is very big & Complicated &146; example."
# returns: "This R&amp;M &amp; Exapmle &nbsp;. It is very big &amp; Complicated &amp;146; example."

&146;错误地写入了。如果您有更多这类错误，请在往返编码之前过滤/替换它们。

Answer 3

之前我找到了一个更好的答案，并采用了贴出来的代码，把它做成了自己的代码，但我似乎无法在任何地方找到该帖子。

无论哪种方式，这都是我从中获得的解决方案。

现在，编码器目前仅支持 ，&，"，<和&gt，但添加支持非常容易更多HTML实体。

首先，这是编码器：

var Encoder = {
    encode: (function() {
      var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
            translate = {
                'nbsp': String.fromCharCode(160), 
                'amp' : '&', 
                'quot': '"',
                'lt'  : '<', 
                'gt'  : '>'
            },
            translator = function($0, $1) { 
                return translate[$1]; 
            };

        return function(s) {
            if(typeof s === 'string')
                return s.replace(translate_re, translator);
            else
                return s;
        };
    })(),
    decode: (function() {
        var reg_str = '(<|>|"|&|' + String.fromCharCode(160) + ')';
        var translate_re = new RegExp(reg_str, 'g');

        var translate = {
            '&' : '&amp', 
            '"': '&quot',
            '<'  : '&lt', 
            '>'  : '&gt'
        };

        translate[String.fromCharCode(160)] = '&nbsp;';

        var translator = function($0, $1) { 
                return translate[$1]; 
        };

        return function(s) {
            if(typeof s === 'string')
                return s.replace(translate_re, translator);
            else
                return s;
        };
    })()
};

var Encoder = {
	encode: (function() {
	  var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
			translate = {
				'nbsp': String.fromCharCode(160), 
				'amp' : '&', 
				'quot': '"',
				'lt'  : '<', 
				'gt'  : '>'
			},
			translator = function($0, $1) { 
				return translate[$1]; 
			};

		return function(s) {
			if(typeof s === 'string')
				return s.replace(translate_re, translator);
			else
				return s;
		};
	})(),
	decode: (function() {
		var reg_str = '(<|>|"|&|' + String.fromCharCode(160) + ')';
		var translate_re = new RegExp(reg_str, 'g');
		
		var translate = {
			'&' : '&amp', 
			'"': '&quot',
			'<'  : '&lt', 
			'>'  : '&gt'
		};
		
		translate[String.fromCharCode(160)] = '&nbsp;';
		
		var translator = function($0, $1) { 
				return translate[$1]; 
		};

		return function(s) {
			if(typeof s === 'string')
				return s.replace(translate_re, translator);
			else
				return s;
		};
	})()
};

//Here is our string with HTML entities in it
var str = 'Non-Breaking Space: "&nbsp;", Ampersand: "&amp;", Quote: "&quot", Less-Than: "&lt", Greater-Than: "&gt"';

//Lets get our div's
var output_not_endcoded = document.getElementById("output_not_endcoded");
var output_endcoded = document.getElementById("output_endcoded");

//If this div exists, add the string with the HTML entities as is
if(output_not_endcoded)
  output_not_endcoded.innerHTML = str;

//If the other div exists, decode the HTML entities and set it as its contents
if(output_endcoded)
  output_endcoded.innerHTML = Encoder.decode(str);

* {
  font: 13.2px "Courier New", Arial, sans-serif; 
}

body {
  font-size: 100%;
}

.row {
  width:100%;
  height:auto;
  padding: 8px 6px;
}

With HTML Entities:
<div id="output_not_endcoded" class="row" ></div>
<br/>
With HTML Entities Decoded:
<div id="output_endcoded" class="row" ></div>

添加对其他HTML实体的支持非常容易。

查看编码器，您将看到我们的翻译部分。一部分包含正则表达式，另一部分包含我们的翻译字段。

正则表达式：

var translate_re = /&(nbsp|amp|quot|lt|gt);/g

文：

translate = {
    'nbsp': String.fromCharCode(160), 
    'amp' : '&', 
    'quot': '"',
    'lt'  : '<', 
    'gt'  : '>'
}

正则表达式：

var translate_re = /&(nbsp|amp|quot|lt|gt|copy);/g

文：

translate = {
    'nbsp': String.fromCharCode(160), 
    'amp' : '&', 
    'quot': '"',
    'lt'  : '<', 
    'gt'  : '>',
    'copy': '©',
}

如果您需要来回进行编码和解码，请确保为编码和解码功能添加支持。

就是这样！我希望有所帮助！

Answer 4

更新正则表达式以使用负向预测更改＆符号以避免更改HTML实体

&(?!(#[0-9]{2,4}|[A-z]{2,6});)

正则表达式替换＆amp;与＆amp;

4 个答案: