Question

我有一个包含非ascii格式的数字的字符串，例如unicode BENGALI DIGIT ONE（U + 09E7）："১"

如何在.NET中将其解析为整数？

注意：我尝试使用int.Parse()指定bengali文化格式，并将“bn-BD”作为IFormatProvider。不起作用。

Answer 1

您可以创建一个与旧字符串相同的新字符串，除了将原始数字替换为拉丁十进制数字。这可以通过循环遍历字符并检查char.IsDigit(char)的值来可靠地完成。如果此函数返回true，则将其转换为char.GetNumericValue(char).ToString()。

像这样：

static class DigitHelper
{
    public static string ConvertNativeDigits(this string text)
    {
        if (text == null)
            return null;
        if (text.Length == 0)
            return string.Empty;
        StringBuilder sb = new StringBuilder();
        foreach (char character in text)
        {
            if (char.IsDigit(character))
                sb.Append(char.GetNumericValue(character));
            else
                sb.Append(character);
        }
        return sb.ToString();
    }
}


int value = int.Parse(bengaliNumber.ConvertNativeDigits());

Answer 2

看起来这是not possible使用内置功能：

.NET Framework解析为十进制的唯一Unicode数字是ASCII数字0到9，由代码值U + 0030到U + 0039指定。

...

尝试解析全长数字，阿拉伯语 - 印度数字和孟加拉数字的Unicode代码值失败并抛出异常。

（强调我的）

非常奇怪，因为CultureInfo("bn-BD").NumberFormat.NativeDigits确实包含它们。

Answer 3

在寻找类似的答案时发现了这个问题，但没有找到任何与我需要的答案完全匹配的答案，我写了以下内容，因为它对待标志没问题，如果给出一个非常长的字符串，则更快失败。但是，它不会忽略任何分组字符，例如,，'，’，但如果有人想要的话，可以轻松添加（我没有）：

public static int ParseIntInternational(this string str)
{
  int result = 0;
  bool neg = false;
  bool seekingSign = true; // Accept sign at beginning only.
  bool done = false; // Accept whitespace at beginning end or between sign and number.
                     // If we see whitespace once we've seen a number, we're "done" and
                     // further digits should fail.
  for(int i = 0; i != str.Length; ++i)
  {
    if(char.IsWhiteSpace(str, i))
    {
      if(!seekingSign)
        done = true;
    }
    else if(char.IsDigit(str, i))
    {
      if(done)
        throw new FormatException();
      seekingSign = false;
      result = checked(result * 10 + (int)char.GetNumericValue(str, i));
    }
    else if(seekingSign)
      switch(str[i])
      {
        case '﬩': case '+':
          //do nothing: Sign unchanged.
          break;
        case '-': case '−':
          neg = !neg;
          break;
        default:
          throw new FormatException();
      }
    else throw new FormatException();
  }
  if(seekingSign)
    throw new FormatException();
  return neg ? -result : result;
}

在.NET中将非ascii（unicode）数字字符串解析为整数

3 个答案: