将浮点格式的char []转换为float

时间:2018-07-29 20:07:16

标签: c# performance parsing floating-point

我有一个char[] salary,其中包含来自string的数据。我想将char[] salary转换为float,但是按照我尝试的方法,它似乎非常慢,

float ff = float.Parse(new string(salary));

根据Visual Studio的Performance Profiler,此处理量过多:

enter image description here

因此,我想知道是否有更快的方法来执行此操作,因为此处的性能很重要。 char[]的格式如下:

[ '1', '3, '2', ',' '2', '9']

基本上是一种类似于JSON的浮点型,转换为适合char[]的每个数字(和逗号)。

编辑:

我已经重新格式化了代码,似乎性能下降实际上是从char[]string的转换,而不是从stringfloat的解析。

4 个答案:

答案 0 :(得分:5)

由于此问题已从“解析float的最快方法是什么?”更改为关于“从string获取char[]的最快方法是什么?”,我用BenchmarkDotNet编写了一些基准测试以比较各种方法。我的发现是,如果您已经拥有char[],那么就像将它传递给string(char[])构造函数一样,您将获得比以前更快的速度。

您说输入文件“被读入byte[],然后将byte[]中代表float的部分提取到char[]中。”由于您有byte组成了float中孤立的byte[]文本,因此也许可以跳过中间的char[]来提高性能。假设您有相当于...的东西

byte[] floatBytes = new byte[] { 0x31, 0x33, 0x32, 0x2C, 0x32, 0x39 }; // "132,29"

...您可以使用Encoding.GetString() ...

string floatString = Encoding.ASCII.GetString(floatBytes);

...这几乎是将Encoding.GetChars()的结果传递给string(char[])构造函数的两倍...

char[] floatChars = Encoding.ASCII.GetChars(floatBytes);
string floatString = new string(floatChars);

您会在我的结果中找到最后列出的那些基准...

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
  [Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
  Core   : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT


                                               Method | Runtime |       Categories |      Mean | Scaled |
----------------------------------------------------- |-------- |----------------- |----------:|-------:|
                         String_Constructor_CharArray |     Clr | char[] => string |  13.51 ns |   1.00 |
                                        String_Concat |     Clr | char[] => string | 192.87 ns |  14.27 |
 StringBuilder_Local_AppendSingleChar_DefaultCapacity |     Clr | char[] => string |  60.74 ns |   4.49 |
   StringBuilder_Local_AppendSingleChar_ExactCapacity |     Clr | char[] => string |  60.26 ns |   4.46 |
   StringBuilder_Local_AppendAllChars_DefaultCapacity |     Clr | char[] => string |  51.27 ns |   3.79 |
     StringBuilder_Local_AppendAllChars_ExactCapacity |     Clr | char[] => string |  49.51 ns |   3.66 |
                 StringBuilder_Field_AppendSingleChar |     Clr | char[] => string |  51.14 ns |   3.78 |
                   StringBuilder_Field_AppendAllChars |     Clr | char[] => string |  32.95 ns |   2.44 |
                                                      |         |                  |           |        |
                       String_Constructor_CharPointer |     Clr |  void* => string |  29.28 ns |   1.00 |
                      String_Constructor_SBytePointer |     Clr |  void* => string |  89.21 ns |   3.05 |
                   UnsafeArrayCopy_String_Constructor |     Clr |  void* => string |  42.82 ns |   1.46 |
                                                      |         |                  |           |        |
                                   Encoding_GetString |     Clr | byte[] => string |  37.33 ns |   1.00 |
                 Encoding_GetChars_String_Constructor |     Clr | byte[] => string |  60.83 ns |   1.63 |
                     SafeArrayCopy_String_Constructor |     Clr | byte[] => string |  27.55 ns |   0.74 |
                                                      |         |                  |           |        |
                         String_Constructor_CharArray |    Core | char[] => string |  13.27 ns |   1.00 |
                                        String_Concat |    Core | char[] => string | 172.17 ns |  12.97 |
 StringBuilder_Local_AppendSingleChar_DefaultCapacity |    Core | char[] => string |  58.68 ns |   4.42 |
   StringBuilder_Local_AppendSingleChar_ExactCapacity |    Core | char[] => string |  59.85 ns |   4.51 |
   StringBuilder_Local_AppendAllChars_DefaultCapacity |    Core | char[] => string |  40.62 ns |   3.06 |
     StringBuilder_Local_AppendAllChars_ExactCapacity |    Core | char[] => string |  43.67 ns |   3.29 |
                 StringBuilder_Field_AppendSingleChar |    Core | char[] => string |  54.49 ns |   4.11 |
                   StringBuilder_Field_AppendAllChars |    Core | char[] => string |  31.05 ns |   2.34 |
                                                      |         |                  |           |        |
                       String_Constructor_CharPointer |    Core |  void* => string |  22.87 ns |   1.00 |
                      String_Constructor_SBytePointer |    Core |  void* => string |  83.11 ns |   3.63 |
                   UnsafeArrayCopy_String_Constructor |    Core |  void* => string |  35.30 ns |   1.54 |
                                                      |         |                  |           |        |
                                   Encoding_GetString |    Core | byte[] => string |  36.19 ns |   1.00 |
                 Encoding_GetChars_String_Constructor |    Core | byte[] => string |  58.99 ns |   1.63 |
                     SafeArrayCopy_String_Constructor |    Core | byte[] => string |  27.81 ns |   0.77 |

...从运行此代码开始(需要BenchmarkDotNet assembly并使用/unsafe进行编译)...

using System;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;

namespace StackOverflow_51584129
{
    [CategoriesColumn()]
    [ClrJob()]
    [CoreJob()]
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StringCreationBenchmarks
    {
        private static readonly Encoding InputEncoding = Encoding.ASCII;

        private const string InputString = "132,29";
        private static readonly byte[] InputBytes = InputEncoding.GetBytes(InputString);
        private static readonly char[] InputChars = InputString.ToCharArray();
        private static readonly sbyte[] InputSBytes = InputBytes.Select(Convert.ToSByte).ToArray();

        private GCHandle _inputBytesHandle;
        private GCHandle _inputCharsHandle;
        private GCHandle _inputSBytesHandle;

        private StringBuilder _builder;

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("char[] => string")]
        public string String_Constructor_CharArray()
        {
            return new string(InputChars);
        }

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("void* => string")]
        public unsafe string String_Constructor_CharPointer()
        {
            var pointer = (char*) _inputCharsHandle.AddrOfPinnedObject();

            return new string(pointer);
        }

        [Benchmark()]
        [BenchmarkCategory("void* => string")]
        public unsafe string String_Constructor_SBytePointer()
        {
            var pointer = (sbyte*) _inputSBytesHandle.AddrOfPinnedObject();

            return new string(pointer);
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string String_Concat()
        {
            return string.Concat(InputChars);
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendSingleChar_DefaultCapacity()
        {
            var builder = new StringBuilder();

            foreach (var c in InputChars)
                builder.Append(c);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendSingleChar_ExactCapacity()
        {
            var builder = new StringBuilder(InputChars.Length);

            foreach (var c in InputChars)
                builder.Append(c);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendAllChars_DefaultCapacity()
        {
            var builder = new StringBuilder().Append(InputChars);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendAllChars_ExactCapacity()
        {
            var builder = new StringBuilder(InputChars.Length).Append(InputChars);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Field_AppendSingleChar()
        {
            _builder.Clear();

            foreach (var c in InputChars)
                _builder.Append(c);

            return _builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Field_AppendAllChars()
        {
            return _builder.Clear().Append(InputChars).ToString();
        }

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("byte[] => string")]
        public string Encoding_GetString()
        {
            return InputEncoding.GetString(InputBytes);
        }

        [Benchmark()]
        [BenchmarkCategory("byte[] => string")]
        public string Encoding_GetChars_String_Constructor()
        {
            var chars = InputEncoding.GetChars(InputBytes);

            return new string(chars);
        }

        [Benchmark()]
        [BenchmarkCategory("byte[] => string")]
        public string SafeArrayCopy_String_Constructor()
        {
            var chars = new char[InputString.Length];

            for (int i = 0; i < InputString.Length; i++)
                chars[i] = Convert.ToChar(InputBytes[i]);

            return new string(chars);
        }

        [Benchmark()]
        [BenchmarkCategory("void* => string")]
        public unsafe string UnsafeArrayCopy_String_Constructor()
        {
            fixed (char* chars = new char[InputString.Length])
            {
                var bytes = (byte*) _inputBytesHandle.AddrOfPinnedObject();

                for (int i = 0; i < InputString.Length; i++)
                    chars[i] = Convert.ToChar(bytes[i]);

                return new string(chars);
            }
        }

        [GlobalSetup(Targets = new[] { nameof(StringBuilder_Field_AppendAllChars), nameof(StringBuilder_Field_AppendSingleChar) })]
        public void SetupStringBuilderField()
        {
            _builder = new StringBuilder();
        }

        [GlobalSetup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
        public void SetupBytesHandle()
        {
            _inputBytesHandle = GCHandle.Alloc(InputBytes, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
        public void CleanupBytesHandle()
        {
            _inputBytesHandle.Free();
        }

        [GlobalSetup(Target = nameof(String_Constructor_CharPointer))]
        public void SetupCharsHandle()
        {
            _inputCharsHandle = GCHandle.Alloc(InputChars, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(String_Constructor_CharPointer))]
        public void CleanupCharsHandle()
        {
            _inputCharsHandle.Free();
        }

        [GlobalSetup(Target = nameof(String_Constructor_SBytePointer))]
        public void SetupSByteHandle()
        {
            _inputSBytesHandle = GCHandle.Alloc(InputSBytes, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(String_Constructor_SBytePointer))]
        public void CleanupSByteHandle()
        {
            _inputSBytesHandle.Free();
        }

        public static void Main(string[] args)
        {
            BenchmarkDotNet.Running.BenchmarkRunner.Run<StringCreationBenchmarks>();
        }
    }
}

答案 1 :(得分:3)

float解析方面,根据您调用的float.Parse()的重载以及传递给它的内容,会有一些收获。我运行了一些比较这些重载的基准测试(请注意,我将十进制分隔符从','更改为'.',以便可以指定CultureInfo.InvariantCulture)。

例如,调用占用IFormatProvider的重载可以使性能提高大约10%。为NumberStyles参数指定NumberStyles.Float(“ lax”)会导致沿任一方向 大约一个百分点的性能变化,并且对输入数据进行一些假设,仅指定NumberStyles.AllowDecimalPoint(“严格”)可以使性能提高几分。 (float.Parse(string) overload使用NumberStyles.Float | NumberStyles.AllowThousands。)

关于对输入数据进行假设的主题,如果您知道所使用的文本具有某些特征(单字节字符编码,无无效数字,无负数,无指数,无需处理{{ 3}}或NaN / positive无穷大等),您最好直接从byte进行解析,并放弃任何不必要的特殊情况处理和错误检查。我在基准测试中包含了一个非常简单的实现,它能够比negative更快地从float获得byte[] floatfloat.Parse(string) 1}}来自string

这是我的基准测试结果...

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
  [Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
  Core   : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT


                                                        Method | Runtime |       Mean | Scaled |
-------------------------------------------------------------- |-------- |-----------:|-------:|
                                           float.Parse(string) |     Clr | 145.098 ns |   1.00 |
                        'float.Parse(string, IFormatProvider)' |     Clr | 134.191 ns |   0.92 |
                     'float.Parse(string, NumberStyles) [Lax]' |     Clr | 145.884 ns |   1.01 |
                  'float.Parse(string, NumberStyles) [Strict]' |     Clr | 139.417 ns |   0.96 |
    'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' |     Clr | 133.800 ns |   0.92 |
 'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' |     Clr | 127.413 ns |   0.88 |
                       'Custom byte-to-float parser [Indexer]' |     Clr |   7.657 ns |   0.05 |
                    'Custom byte-to-float parser [Enumerator]' |     Clr | 566.440 ns |   3.90 |
                                                               |         |            |        |
                                           float.Parse(string) |    Core | 154.369 ns |   1.00 |
                        'float.Parse(string, IFormatProvider)' |    Core | 138.668 ns |   0.90 |
                     'float.Parse(string, NumberStyles) [Lax]' |    Core | 155.644 ns |   1.01 |
                  'float.Parse(string, NumberStyles) [Strict]' |    Core | 150.221 ns |   0.97 |
    'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' |    Core | 142.591 ns |   0.92 |
 'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' |    Core | 135.000 ns |   0.87 |
                       'Custom byte-to-float parser [Indexer]' |    Core |  12.673 ns |   0.08 |
                    'Custom byte-to-float parser [Enumerator]' |    Core | 584.236 ns |   3.78 |

...从运行此代码开始(需要BenchmarkDotNet assembly)...

using System;
using System.Globalization;
using BenchmarkDotNet.Attributes;

namespace StackOverflow_51584129
{
    [ClrJob()]
    [CoreJob()]
    public class FloatParsingBenchmarks
    {
        private const string InputString = "132.29";
        private static readonly byte[] InputBytes = System.Text.Encoding.ASCII.GetBytes(InputString);

        private static readonly IFormatProvider ParsingFormatProvider = CultureInfo.InvariantCulture;
        private const NumberStyles LaxParsingNumberStyles = NumberStyles.Float;
        private const NumberStyles StrictParsingNumberStyles = NumberStyles.AllowDecimalPoint;
        private const char DecimalSeparator = '.';

        [Benchmark(Baseline = true, Description = "float.Parse(string)")]
        public float SystemFloatParse()
        {
            return float.Parse(InputString);
        }

        [Benchmark(Description = "float.Parse(string, IFormatProvider)")]
        public float SystemFloatParseWithProvider()
        {
            return float.Parse(InputString, CultureInfo.InvariantCulture);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles) [Lax]")]
        public float SystemFloatParseWithLaxNumberStyles()
        {
            return float.Parse(InputString, LaxParsingNumberStyles);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles) [Strict]")]
        public float SystemFloatParseWithStrictNumberStyles()
        {
            return float.Parse(InputString, StrictParsingNumberStyles);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Lax]")]
        public float SystemFloatParseWithLaxNumberStylesAndProvider()
        {
            return float.Parse(InputString, LaxParsingNumberStyles, ParsingFormatProvider);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Strict]")]
        public float SystemFloatParseWithStrictNumberStylesAndProvider()
        {
            return float.Parse(InputString, StrictParsingNumberStyles, ParsingFormatProvider);
        }

        [Benchmark(Description = "Custom byte-to-float parser [Indexer]")]
        public float CustomFloatParseByIndexing()
        {
            // FOR DEMONSTRATION PURPOSES ONLY!
            // This code has been written for and only tested with
            // parsing the ASCII string "132.29" in byte form
            var currentIndex = 0;
            var boundaryIndex = InputBytes.Length;
            char currentChar;
            var wholePart = 0;

            while (currentIndex < boundaryIndex && (currentChar = (char) InputBytes[currentIndex++]) != DecimalSeparator)
            {
                var currentDigit = currentChar - '0';

                wholePart = 10 * wholePart + currentDigit;
            }

            var fractionalPart = 0F;
            var nextFractionalDigitScale = 0.1F;

            while (currentIndex < boundaryIndex)
            {
                currentChar = (char) InputBytes[currentIndex++];
                var currentDigit = currentChar - '0';

                fractionalPart += currentDigit * nextFractionalDigitScale;
                nextFractionalDigitScale *= 0.1F;
            }

            return wholePart + fractionalPart;
        }

        [Benchmark(Description = "Custom byte-to-float parser [Enumerator]")]
        public float CustomFloatParseByEnumerating()
        {
            // FOR DEMONSTRATION PURPOSES ONLY!
            // This code has been written for and only tested with
            // parsing the ASCII string "132.29" in byte form
            var wholePart = 0;
            var enumerator = InputBytes.GetEnumerator();

            while (enumerator.MoveNext())
            {
                var currentChar = (char) (byte) enumerator.Current;

                if (currentChar == DecimalSeparator)
                    break;

                var currentDigit = currentChar - '0';
                wholePart = 10 * wholePart + currentDigit;
            }

            var fractionalPart = 0F;
            var nextFractionalDigitScale = 0.1F;

            while (enumerator.MoveNext())
            {
                var currentChar = (char) (byte) enumerator.Current;
                var currentDigit = currentChar - '0';

                fractionalPart += currentDigit * nextFractionalDigitScale;
                nextFractionalDigitScale *= 0.1F;
            }

            return wholePart + fractionalPart;
        }

        public static void Main()
        {
            BenchmarkDotNet.Running.BenchmarkRunner.Run<FloatParsingBenchmarks>();
        }
    }
}

答案 2 :(得分:2)

在家里制定优化细节的有趣主题:)祝大家健康。

我的目标是:在C#中尽快将Ascii CSV矩阵转换为float矩阵。为此,它会产生string.Split()行并分别转换每个术语也会带来开销。为了克服这个问题,我修改了BACON的行解析我的float的解决方案,使其像这样使用:

  var falist = new List<float[]>();
  for (int row=0; row<sRowList.Count; row++)
  {
    var sRow = sRowList[row];
    falist.Add(CustomFloatParseRowByIndexing(nTerms, sRow.ToCharArray(), '.'));
  }

下面是我的行解析器变体的代码。这些是基准测试结果,将40x31矩阵转换为1000x:

Benchmark0:拆分行并解析每一项以转换为浮点矩阵 dT = 704 ms

基准1:将每一项拆分行和TryParse以转换为浮点矩阵 dT = 640毫秒

Benchmark2:拆分行和CustomFloatParseByIndexing将条款转换为浮点矩阵 dT = 211 ms

基准3:使用CustomFloatParseRowByIndexing将行转换为浮点矩阵 dT = 120 ms

public float[] CustomFloatParseRowByIndexing(int nItems, char[] InputBytes, char   DecimalSeparator)
{
// Convert semicolon-separated floats from InputBytes into nItems float[] result.
// Constraints are:
//   - no scientific notation or .x allowed
//   - every row has exactly nItems values
//   - semicolon delimiter after each value
//   - terms 'u'  or 'undef' or 'undefined' allowed for bad values
//   - minus sign allowed
//   - leading space allowed
//   - all terms must comply

// FOR DEMO PURPOSE ONLY
// based on BACON on Stackoverflow, modified to read nItems delimited float values
// https://stackoverflow.com/questions/51584129/convert-a-float-formated-char-to-float

var currentIndex = 0;
var boundaryIndex = InputBytes.Length;
bool termready, ready = false;
float[] result = new float[nItems];
int cItem = 0;
while (currentIndex < boundaryIndex)
{
    termready = false;
    if ((char)InputBytes[currentIndex] == ' ') { currentIndex++; continue; }
    char currentChar;
    var wholePart = 0;
    float sgn = 1;
    while (currentIndex < boundaryIndex && (currentChar = (char)InputBytes[currentIndex++]) != DecimalSeparator)
    {
        if (currentChar == 'u')
        {
            while ((char)InputBytes[currentIndex++] != ';') ;
            result[cItem++] = -9999.0f;
            continue;
        }
        else
        if (currentChar == ' ')
        {                       
            continue;
        }
        else
        if (currentChar == ';')
        {
            termready = true;
            break;
        }
        else
        if (currentChar == '-') sgn = -1;
        else
        {
            var currentDigit = currentChar - '0';
            wholePart = 10 * wholePart + currentDigit;
        }
    }
    var fractionalPart = 0F;
    var nextFractionalDigitScale = 0.1F;
    if (!termready)
        while (currentIndex < boundaryIndex)
        {
            currentChar = (char)InputBytes[currentIndex++];
            if (currentChar == ';')
            {
                termready = true;
                break;
            }
            var currentDigit = currentChar - '0';
            fractionalPart += currentDigit * nextFractionalDigitScale;
            nextFractionalDigitScale *= 0.1F;
        }
    if (termready) 
    { 
      result[cItem++] = sgn * (wholePart + fractionalPart); 
    }
  }   
  return result;
}

答案 3 :(得分:1)

经过一些实验和来自this的测试:

string获得char[]的最快方法是使用new string

请注意,在输入无效的情况下,紧随Microsoft的article之后,TryParse是解析float的最快方法。因此,请考虑一下。

  

TryParse仅占用执行时间的0.5%,Parse占用18%的时间,而Convert占用14%的时间