Question

我正在为ASCII字符串类实现TryParse()方法。该方法接受一个字符串并将其转换为C风格的字符串（即以空字符结尾的ASCII字符串）。

我一直只使用Parse()，使用::

转换为ASCII

public static bool Parse(string s, out byte[] result)
{
    result = null;
    if (s == null || s.Length < 1)
        return false;

    byte[]d = new byte[s.Length + 1]; // Add space for null-terminator
    System.Text.Encoding.ASCII.GetBytes(s).CopyTo(d, 0); 
    // GetBytes can throw exceptions 
    // (so can CopyTo() but I can replace that with a loop)
    result = d;
    return true;
}

然而，作为TryParse想法的一部分是消除异常的开销，GetBytes()抛出异常，我正在寻找一种不这样做的不同方法。

也许有类似TryGetbytes()的方法？

或许我们可以推断标准.Net string的预期格式并以数学方式执行更改（我对UTF编码不太熟悉）？

编辑：我想对于字符串中的非ASCII字符，TryParse()方法应返回false

编辑：我希望当我开始为这个课程实现ToString()方法时，我可能需要反过来。

Answer 1

根据the documentation，Encoding.GetBytes可能会抛出两种可能的例外情况。

ArgumentNullException很容易避免。对您的输入进行空检查，您可以确保永远不会抛出它。

EncoderFallbackException需要更多调查... Reading the documentation:

回退策略确定编码器如何处理无效字符或解码器如何处理无效字节。

如果我们查看documentation for ASCII encoding，我们会看到：

它使用替换回退来替换它不能编码的每个字符串以及不能用问号（“？”）字符解码的每个字节。

这意味着它不会使用异常回退，因此永远不会抛出EncoderFallbackException。

总而言之，如果您使用ASCII编码并确保不传入空字符串，那么调用GetBytes将永远不会抛出异常。

Answer 2

两个选项：

你可以完全忽略Encoding，然后自己编写循环：

public static bool TryParse(string s, out byte[] result)
{
    result = null;
    // TODO: It's not clear why you don't want to be able to convert an empty string
    if (s == null || s.Length < 1)
    {
        return false;
    }

    byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
    for (int i = 0; i < s.Length; i++)
    {
        char c = s[i];
        if (c > 127)
        {
            return false;
        }
        buffer[i] = (byte) c;
    }
    result = buffer;
    return true;
}

这很简单，但可能比使用Encoding.GetBytes稍慢。

第二个选项是使用自定义EncoderFallback：

public static bool TryParse(string s, out byte[] result)
{
    result = null;
    // TODO: It's not clear why you don't want to be able to convert an empty string
    if (s == null || s.Length < 1)
    {
        return false;
    }

    var fallback = new CustomFallback();
    var encoding = new ASCIIEncoding { EncoderFallback = fallback };
    byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
    // Use overload of Encoding.GetBytes that writes straight into the buffer
    encoding.GetBytes(s, 0, s.Length, buffer, 0);
    if (fallback.HadErrors)
    {
        return false;
    }
    result = buffer;
    return true;
}

这需要编写CustomFallback - 它需要基本上跟踪是否曾被要求处理无效输入。

如果您不介意对数据进行两次编码处理，可以使用基于UTF-8的编码调用Encoding.GetByteCount并使用替换回退（使用非ASCII替换字符），并检查是否返回与字符串中的字符数相同的字节数。如果是，请拨打Encoding.ASCII.GetBytes。

除非你有理由相信它太慢，否则我个人会选择第一个选项。

Answer 3

GetBytes方法抛出异常，因为Encoding.EncoderFallback指定它应该抛出异常。

使用EncoderReplacementFallback创建一个编码对象，以避免对不可编码字符的异常。

Encoding encodingWithFallback = new ASCIIEncoding() { DecoderFallback = DecoderFallback.ReplacementFallback };
encodingWithFallback.GetBytes("Hɘ££o wor£d!");

这种方式模仿原始.NET值类型的TryParse方法：

bool TryEncodingToASCII(string s, out byte[] result)
{
    if (s == null || Regex.IsMatch(s, "[^\x00-\x7F]")) // If a single ASCII character is found, return false.
    {
        result = null;
        return false;
    }
    result = Encoding.ASCII.GetBytes(s); // Convert the string to ASCII bytes.
    return true;
}

无异常地将字符串转换为ASCII（如TryParse）

3 个答案: