我从包含编码的unicode字符“& #xfc; ”的字符串开始。我将字符串传递给执行某些逻辑并返回另一个字符串的对象。该字符串将原始编码字符转换为其等效的“ü”。
我需要恢复原始编码字符,但到目前为止还不能。
我尝试过使用HttpUtility.HtmlEncode()方法,但是返回“ü ”这是不一样的。
有人可以帮忙吗?
答案 0 :(得分:4)
它们几乎相同,至少是出于显示目的。 HttpUtility.HtmlEncode
使用十进制编码,格式为&#DECIMAL;
,而原始版本采用hexadecimal编码,即格式为&#xHEX;
。由于十六进制中的fc
是十进制的252
,因此两者是等价的。
如果您确实需要获取十六进制编码版本,请考虑解析小数和converting it to hex,然后再将其重新填入&#xHEX;
格式。像
string unicode = "ü";
string decimalEncoded = HttpUtility.HtmlEncode(unicode);
int decimal = int.Parse(decimalEncoded.Substring(2, decimalEncoded.Length - 3);
string hexEncoded = string.Format("&#x{0:X};", decimal);
答案 1 :(得分:1)
或者你可以试试这段代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web;
using System.Configuration;
using System.Globalization;
namespace SimpleCGIEXE
{
class Program
{
static string Uni2Html(string src)
{
string temp1 = HttpUtility.UrlEncodeUnicode(src);
string temp2 = temp1.Replace('+', ' ');
string res = string.Empty;
int pos1 = 0, pos2 = 0;
while (true){
pos2=temp2.IndexOf("%",pos1);
if (pos2 < 0) break;
if (temp2[pos2 + 1] == 'u')
{
res += temp2.Substring(pos1, pos2 - pos1);
res += "&#x";
res += temp2.Substring(pos2 + 2, 4);
res += ";";
pos1 = pos2 + 6;
}
else
{
res += temp2.Substring(pos1, pos2 - pos1);
string stASCII = temp2.Substring(pos2 + 1, 2);
byte[] pdASCII = new byte[1];
pdASCII[0] = byte.Parse(stASCII, System.Globalization.NumberStyles.AllowHexSpecifier);
res += Encoding.ASCII.GetString(pdASCII);
pos1 = pos2 + 3;
}
}
res += temp2.Substring(pos1);
return res;
}
static void Main(string[] args)
{
Console.WriteLine("Content-type: text/html;charset=utf-8\r\n");
String st = "Vietnamese string: Thử một xâu unicode @@ # ~ .^ % !";
Console.WriteLine(Uni2Html(st) + "<br>");
st = "A chinese string: 我爱你 (I love you)";
Console.WriteLine(Uni2Html(st) + "<br>");
}
}
}
答案 2 :(得分:0)
我不得不在今天的日子里解决这个问题。
比看单个角色要复杂一点。您需要滚动自己的HtmlEncode()方法。 .Net世界中的字符串是UTF-16编码的。 Unicode代码点(HTML数字字符引用标识的内容)是32位无符号整数值。这主要是一个问题,你必须处理Unicodes以外的人物“基本的多语言平面”。
此代码应该按您的要求执行
using System;
using System.Configuration ;
using System.Globalization ;
using System.Collections.Generic ;
using System.Text;
namespace TestDrive
{
class Program
{
static void Main()
{
string src = "foo \uABC123 bar" ;
string converted = HtmlEncode(src) ;
return ;
}
static string HtmlEncode( string s )
{
//
// In the .Net world, strings are UTF-16 encoded. That means that Unicode codepoints greater than 0x007F
// are encoded in the string as 2-character digraphs. So to properly turn them into HTML numeric
// characeter references (decimal or hex), we first need to get the UTF-32 encoding.
//
uint[] utf32Chars = StringToArrayOfUtf32Chars( s ) ;
StringBuilder sb = new StringBuilder( 2000 ) ; // set a reasonable initial size for the buffer
// iterate over the utf-32 encoded characters
foreach ( uint codePoint in utf32Chars )
{
if ( codePoint > 0x0000007F )
{
// if the code point is greater than 0x7F, it gets turned into an HTML numerica character reference
sb.AppendFormat( "&#x{0:X};" , codePoint ) ; // hex escape sequence
//sb.AppendFormat( "&#{0};" , codePoint ) ; // decimal escape sequence
}
else
{
// if less than or equal to 0x7F, it goes into the string as-is,
// except for the 5 SGML/XML/HTML reserved characters. You might
// want to also escape all the ASCII control characters (those chars
// in the range 0x00 - 0x1F).
// convert the unit to an UTF-16 character
char ch = Convert.ToChar( codePoint ) ;
// do the needful.
switch ( ch )
{
case '"' : sb.Append( """ ) ; break ;
case '\'' : sb.Append( "'" ) ; break ;
case '&' : sb.Append( "&" ) ; break ;
case '<' : sb.Append( "<" ) ; break ;
case '>' : sb.Append( ">" ) ; break ;
default : sb.Append( ch.ToString() ) ; break ;
}
}
}
// return the escaped, utf-16 string back to the caller.
string encoded = sb.ToString() ;
return encoded ;
}
/// <summary>
/// Convert a UTF-16 encoded .Net string into an array of UTF-32 encoding Unicode chars
/// </summary>
/// <param name="s"></param>
/// <returns></returns>
private static uint[] StringToArrayOfUtf32Chars( string s )
{
Byte[] bytes = Encoding.UTF32.GetBytes( s ) ;
uint[] utf32Chars = (uint[]) Array.CreateInstance( typeof(uint) , bytes.Length / sizeof(uint) ) ;
for ( int i = 0 , j = 0 ; i < bytes.Length ; i += 4 , ++j )
{
utf32Chars[ j ] = BitConverter.ToUInt32( bytes , i ) ;
}
return utf32Chars ;
}
}
}
希望这有帮助!