IEEE浮点加法

时间:2016-10-31 14:57:25

标签: c# floating-point

而不是

float a = 32.342 , b = 193.132
float total = a + b

如何将它们转换为32位并明确地使用它们的32位添加?

1 个答案:

答案 0 :(得分:0)

如果我理解正确(您希望将float转换为byte 数组然后将这些数组加在一起),您可以实现某些功能像这样:

  // Initial floats 
  float a = 32.342f, b = 193.132f; // do not forget "f" suffix 
  float total = a + b;

  // floats as byte[4] arrays
  byte[] aArray = BitConverter.GetBytes(a);
  byte[] bArray = BitConverter.GetBytes(b);
  // let's compare actual float addition with arrays summation 
  byte[] totalArray = BitConverter.GetBytes(total);

  // Add arrays directly: we may want to convert them into Int32, 
  // add up them up and, finally, convert back to array
  // Reverse().ToArray(): we should take Ending into account  
  int c = unchecked(BitConverter.ToInt32(aArray.Reverse().ToArray(), 0) + 
                    BitConverter.ToInt32(bArray.Reverse().ToArray(), 0));
  byte[] cArray = BitConverter.GetBytes(c).Reverse().ToArray();

可视化(让我们看看所有这些位):

  private static String ToReport(byte[] data) {
    return String.Join(" ", data.Select(x => Convert.ToString(x, 2).PadLeft(8, '0')));
  }

  ...

  String text = String.Join(Environment.NewLine,
    $"a:     {a,7} {ToReport(aArray)}",
    $"b:     {b,7} {ToReport(bArray)}",
    $"a + b:         {ToReport(cArray)}",
    $"total: {total,7} {ToReport(totalArray)}");

  Console.Write(text);

结果:

  a:      32.342 00110101 01011110 00000001 01000010
  b:     193.132 11001011 00100001 01000001 01000011
  a + b:         00000000 01111111 01000010 10000101 // array + array
  total: 225.474 01011000 01111001 01100001 01000011 // float + float