为什么一种方法可以比另一种方法更快地找到字符串中第n个字符出现的位置?

时间:2012-07-24 16:15:41

标签: c# string benchmarking

我注意到a few questions关于在字符串中查找第n个字符的情况。由于我很好奇(并且在应用程序中有多种用途,但主要是出于好奇),我在Visual Studio 2010中对这些方法中的两个进行了编码和基准测试,我想知道为什么方法1(FindNthOccurrence)比方法2(IndexOfNth)慢得多。我能想到的唯一原因是:

  1. 我的基准测试代码存在问题
  2. 我的算法存在问题
  3. 事实indexOf是一个内置的.NET方法,因此已经优化
  4. 我倾向于#2,但我仍然想知道。这是相关的代码。

    代码

    class Program
        {
            static void Main(string[] args)
            {
                char searchChar = 'a';
                Random r = new Random(UnixTimestamp());
    
                // Generate sample data
                int numSearches = 100000, inputLength = 100;
                List<String> inputs = new List<String>(numSearches);
                List<int> nth = new List<int>(numSearches);
                List<int> occurrences = new List<int>(numSearches);
                for (int i = 0; i < numSearches; i++)
                {
                    inputs.Add(GenerateRandomString(inputLength, "abcdefghijklmnopqrstuvwxyz"));
                    nth.Add(r.Next(1, 4));
                }
    
                // Timing of FindNthOccurrence
                Stopwatch timeFindNth = Stopwatch.StartNew();
                for (int i = 0; i < numSearches; i++)
                    occurrences.Add(FindNthOccurrence(inputs[i], searchChar, nth[i]));
                timeFindNth.Stop();
    
                Console.WriteLine(String.Format("FindNthOccurrence: {0} / {1}",
                                                timeFindNth.ElapsedMilliseconds, timeFindNth.ElapsedTicks));
    
                // Cleanup
                occurrences.Clear();
    
                // Timing of IndexOfNth
                Stopwatch timeIndexOf = Stopwatch.StartNew();
                for (int i = 0; i < numSearches; i++)
                    occurrences.Add(IndexOfNth(inputs[i], searchChar, nth[i]));
                timeIndexOf.Stop();
                Console.WriteLine(String.Format("IndexOfNth: {0} / {1}",
                                                timeIndexOf.ElapsedMilliseconds, timeIndexOf.ElapsedTicks));
    
                Console.Read();
            }
    
            static int FindNthOccurrence(String input, char c, int n)
            {
                int len = input.Length;
                int occurrences = 0;
                for (int i = 0; i < len; i++)
                {
                    if (input[i] == c)
                    {
                        occurrences++;
                        if (occurrences == n)
                            return i;
                    }
                }
                return -1;
            }
    
            static int IndexOfNth(String input, char c, int n)
            {
                int occurrence = 0;
                int pos = input.IndexOf(c, 0);
                while (pos != -1)
                {
                    occurrence++;
                    if (occurrence == n)
                        return pos;
                    pos = input.IndexOf(c, pos + 1);
                }
                return -1;
            }
    
                // Helper methods
            static String GenerateRandomString(int length, String legalCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
            {
                if (length < 0) throw new ArgumentOutOfRangeException("length", "length cannot be less than zero.");
                if (string.IsNullOrEmpty(legalCharacters))
                    throw new ArgumentException("allowedChars may not be empty.");
    
                const int byteSize = 0x100;
                var legalCharSet = new HashSet<char>(legalCharacters).ToArray();
                if (byteSize < legalCharSet.Length)
                    throw new ArgumentException(String.Format("allowedChars may contain no more than {0} characters.", byteSize));
    
                // Guid.NewGuid and System.Random are not particularly random. By using a
                // cryptographically-secure random number generator, the caller is always
                // protected, regardless of use.
                using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
                {
                    StringBuilder result = new StringBuilder();
                    var buf = new byte[128];
                    while (result.Length < length)
                    {
                        rng.GetBytes(buf);
                        for (var i = 0; i < buf.Length && result.Length < length; ++i)
                        {
                            // Divide the byte into legalCharSet-sized groups. If the
                            // random value falls into the last group and the last group is
                            // too small to choose from the entire legalCharSet, ignore
                            // the value in order to avoid biasing the result.
                            var outOfRangeStart = byteSize - (byteSize % legalCharSet.Length);
                            if (outOfRangeStart <= buf[i]) continue;
                            result.Append(legalCharSet[buf[i] % legalCharSet.Length]);
                        }
                    }
                    return result.ToString();
                }
            }
    
            static int UnixTimestamp()
            {
                TimeSpan ts = (System.DateTime.UtcNow - new System.DateTime(1970, 1, 1, 0, 0, 0));
                return (int)ts.TotalSeconds;
            }
        }
    

    样本输出

    每个结果输出的时间与此类似(毫秒/经过的刻度):

    FindNthOccurrence: 27 / 79716
    IndexOfNth: 12 / 36492
    

2 个答案:

答案 0 :(得分:1)

我确定您运行的是调试版本。切换到发布版本。两种方法都需要大约相同的时间。

答案 1 :(得分:1)

使用Reflector浏览System.String的源代码,看起来调用IndexOf方法,定义为:

public extern int IndexOf(char value, int startIndex, int count);

因此它调用了一些内部非托管代码,这可能会提供观察到的速度提升。使用托管代码,您不太可能获得更快的速度。

相关问题