实际上是否有任何简单的方法可以找到.NET中的哪些编码与ASCII兼容?
(基于Nyerguds's comment中提出的问题。)
答案 0 :(得分:1)
我们假设ASCII的标准定义限制为128个字符(即最高有效位为0的字节值)。 Unicode的设计使其前128个代码点对应于它们的ASCII等价物。由于.NET中char
结构的数值对应于其Unicode代码点(代理除外),因此我们可以定义一个实用方法,如下所示:
private static readonly byte[] asciiValues =
Enumerable.Range(0, 128).Select(b => (byte)b).ToArray();
private static readonly string asciiChars =
new string(asciiValues.Select(b => (char)b).ToArray());
public static bool IsAsciiCompatible(Encoding encoding)
{
try
{
return encoding.GetString(asciiValues).Equals(asciiChars, StringComparison.Ordinal)
&& encoding.GetBytes(asciiChars).SequenceEqual(asciiValues);
}
catch (ArgumentException)
{
// Encoding.GetString may throw DecoderFallbackException if a fallback occurred
// and DecoderFallback is set to DecoderExceptionFallback.
// Encoding.GetBytes may throw EncoderFallbackException if a fallback occurred
// and EncoderFallback is set to EncoderExceptionFallback.
// Both of these derive from ArgumentException.
return false;
}
}
然后我们可以枚举所有.NET编码:
var encodings = Encoding.GetEncodings().Select(e => e.GetEncoding()).ToList();
var asciiCompatible = encodings.Where(e => IsAsciiCompatible(e)).ToList();
var nonAsciiCompatbile = encodings.Except(asciiCompatible).ToList();
Console.WriteLine("ASCII compatible: ");
foreach (var encodingName in asciiCompatible.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
Console.WriteLine();
Console.WriteLine("Non-ASCII compatible: ");
foreach (var encodingName in nonAsciiCompatbile.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
请注意,此方法并非完全安全。如果存在多字节编码,它执行连续字节或字符的奇特映射 - 例如将0x61
解码为'a'
并将0x62
解码为'b'
(如ASCII中),但0x6261
到"�"
- 然后此测试会得出错误的结果。
在.NET Fiddle(snippet)上运行此命令会得到以下结果:
ASCII兼容:
非ASCII兼容: