我想在整数数组的连续子集中找到公共数字的最大频率

时间:2018-06-02 06:36:13

标签: c# arrays algorithm data-structures dynamic

数组A的部分数字子序列是整数的子序列,其中每个连续的整数至少有一个共同的数字

我保留一个包含0到9个字符的字典以及每个后续字符的计数。然后我遍历整数数组中的所有值,并取每个数字并检查我的字典中的数字计数。

public static void Main(string[] args)
{
    Dictionary<char, int> dct = new Dictionary<char, int>
    {
        { '0', 0 },
        { '1', 0 },
        { '2', 0 },
        { '3', 0 },
        { '4', 0 },
        { '5', 0 },
        { '6', 0 },
        { '7', 0 },
        { '8', 0 },
        { '9', 0 }
    };

    string[] arr = Console.ReadLine().Split(' ');
    for (int i = 0; i < arr.Length; i++)
    {
        string str = string.Join("", arr[i].Distinct());
        for (int j = 0; j < str.Length; j++)
        {
            int count = dct[str[j]];
            if (count == i || (i > 0 && arr[i - 1].Contains(str[j])))
            {
                count++;
                dct[str[j]] = count;
            }
            else dct[str[j]] = 1;
        }
    }
    string s = dct.Aggregate((l, r) => l.Value > r.Value ? l : r).Key.ToString();
    Console.WriteLine(s);
}

例如, 12 23 231 答案是2,因为它发生3次

该数组可以包含10 ^ 18个元素。

有人可以帮助我找到最佳解决方案。该算法不适合处理数组中的大数据。

6 个答案:

答案 0 :(得分:9)

所有发布的答案都是错误的,因为他们都忽略了问题中最重要的部分:

  

该数组可以包含10 ^ 18个元素。

正在从磁盘读取这个数组?假设每个元素是两个字节,那就是200万TB的驱动器只用于阵列。 我认为这不适合记忆。您必须使用流媒体解决方案。

流媒体解决方案需要多长时间?如果您可以在一秒钟内处理十亿个数组项目,这似乎是合理的,那么您的程序将需要32年才能执行。

您的要求不切合实际,因此单个人的资源无法解决问题。您需要大型公司或国家/地区的资源来解决此问题,并且您需要大量资金来进行硬件采购和管理。

线性算法是微不足道的;它是整个问题的数据大小。开始使用廉价的电源和友好的税法在某个地方建立您的数据中心,因为您将要导入大量磁盘。

答案 1 :(得分:0)

你不应该逐个遍历数组元素,只需将整个字符串数组合并为1个字符串并遍历字符

12 23 231 - &gt; &#34; 1223231&#34; ,循环并计算它。

它应该足够快O(n)并且在你的字典中只需要10个条目。如何&#34;快速&#34;你真的需要吗?

答案 2 :(得分:0)

我没有使用数组,我不确定你是否必须使用数组,如果没有,请检查这个解决方案。

static void Main(string[] args)
    {
        List<char> numbers = new List<char>();
        Dictionary<char, int> dct = new Dictionary<char, int>()
        {
            { '0',0 },
            { '1',0 },
            { '2',0 },
            { '3',0 },
            { '4',0 },
            { '5',0 },
            { '6',0 },
            { '7',0 },
            { '8',0 },
            { '9',0 },
        };

        string option;

        do
        {
            Console.Write("Enter number: ");
            string number = Console.ReadLine();
            numbers.AddRange(number);

            Console.Write("Enter 'X' if u want to finish work: ");
            option = Console.ReadLine();

        } while (option.ToLower() != "x");

        foreach(char c in numbers)
        {
            if(dct.ContainsKey(c))
            {
                dct[c]++;
            }
        }

        foreach(var keyValue in dct)
        {
            Console.WriteLine($"Char {keyValue.Key} was used {keyValue.Value} times");
        }

        Console.ReadKey(true);
    }

答案 3 :(得分:0)

好的,生病给你3个版本

基本上,我只是将随机整数列表加载为字符串,比例是多少,并在Core和Framework上运行它来查看。每次测试运行10次并取平均值。

Mine1

使用独特

public unsafe class Mine : Benchmark<List<string>, char>
{
   protected override char InternalRun()
   {
      var result = new int[10];
      var asd = Input.Select(x => new string(x.Distinct().ToArray())).ToList();
      var raw = string.Join("", asd);

      fixed (char* pInput = raw)
      {
         var len = pInput + raw.Length;
         for (var p = pInput; p < len; p++)
         {
            result[*p - 48]++;
         }
      }

      return (char)(result.ToList().IndexOf(result.Max()) + '0');
   }
}

<强> Mine2

基本上,这使用第二个数组来解决问题

public unsafe class Mine2 : Benchmark<List<string>, char>
{
   protected override char InternalRun()
   {
      var result = new int[10];
      var current = new int[10];
      var raw = string.Join(" ", Input);

      fixed (char* pInput = raw)
      {
         var len = pInput + raw.Length;
         for (var p = pInput; p < len; p++)
            if (*p != ' ')
               current[*p - 48] = 1;
            else
               for (var i = 0; i < 10; i++)
               {
                  result[i] += current[i];
                  current[i] = 0;
               }

      }

      return (char)(result.ToList().IndexOf(result.Max()) + '0');
   }
}

<强> Mine3

没有联接或字符串分配

public unsafe class Mine3 : Benchmark<List<string>, char>
{
   protected override char InternalRun()
   {
      var result = new int[10];

      foreach (var item in Input)
         fixed (char* pInput = item)
         {
            var current = new int[10];
            var len = pInput + item.Length;

            for (var p = pInput; p < len; p++)
               current[*p - 48] = 1;

            for (var i = 0; i < 10; i++)
            {
               result[i] += current[i];
               current[i] = 0;
            }
         }


      return (char)(result.ToList().IndexOf(result.Max()) + '0');
   }
}

结果.Net Framework 4.7.1

Mode            : Release
Test Framework  : .Net Framework 4.7.1
Benchmarks runs : 10 times (averaged)

Scale : 10,000
Name     |   Average |   Fastest | StDv |     Cycles | Pass |        Gain
--------------------------------------------------------------------------
Mine3    |  0.533 ms |  0.431 ms | 0.10 |  1,751,372 | Base |      0.00 %
Mine2    |  0.994 ms |  0.773 ms | 0.38 |  3,100,896 | Yes  |    -86.63 %
Mine     |  8.122 ms |  7.012 ms | 1.29 | 27,480,083 | Yes  | -1,424.78 %
Original | 20.729 ms | 16.044 ms | 4.56 | 65,316,558 | No   | -3,791.47 %


Scale : 100,000
Name     |    Average |    Fastest |  StDv |      Cycles | Pass |        Gain
------------------------------------------------------------------------------
Mine3    |   4.766 ms |   4.475 ms |  0.34 |  16,140,716 | Base |      0.00 %
Mine2    |   8.424 ms |   7.890 ms |  0.33 |  28,577,416 | Yes  |    -76.76 %
Mine     |  96.650 ms |  93.066 ms |  3.35 | 327,615,266 | Yes  | -1,927.94 %
Original | 163.342 ms | 154.070 ms | 12.61 | 550,038,934 | No   | -3,327.32 %


Scale : 1,000,000
Name     |      Average |      Fastest |  StDv |        Cycles | Pass |        Gain
------------------------------------------------------------------------------------
Mine3    |    49.827 ms |    48.600 ms |  1.19 |   169,162,589 | Base |      0.00 %
Mine2    |   106.334 ms |    97.641 ms |  6.53 |   359,773,719 | Yes  |   -113.41 %
Mine     | 1,051.600 ms | 1,000.731 ms | 35.75 | 3,511,515,189 | Yes  | -2,010.51 %
Original | 1,640.385 ms | 1,588.431 ms | 65.50 | 5,538,915,638 | No   | -3,192.18 %

结果.Net Core 2.0

Mode            : Release
Test Framework  : Core 2.0
Benchmarks runs : 10 times (averaged)

Scale : 10,000
Name     |   Average |   Fastest | StDv |     Cycles | Pass |        Gain
--------------------------------------------------------------------------
Mine3    |  0.476 ms |  0.353 ms | 0.12 |  1,545,995 | Base |      0.00 %
Mine2    |  0.554 ms |  0.551 ms | 0.00 |  1,883,570 | Yes  |    -16.23 %
Mine     |  7.585 ms |  5.875 ms | 1.27 | 25,580,339 | Yes  | -1,492.28 %
Original | 21.380 ms | 16.263 ms | 6.46 | 65,741,909 | No   | -4,388.14 %


Scale : 100,000
Name     |    Average |    Fastest |  StDv |      Cycles | Pass |        Gain
------------------------------------------------------------------------------
Mine3    |   3.946 ms |   3.685 ms |  0.25 |  13,409,181 | Base |      0.00 %
Mine2    |   6.203 ms |   5.796 ms |  0.33 |  21,042,340 | Yes  |    -57.21 %
Mine     |  72.975 ms |  68.599 ms |  4.13 | 246,471,960 | Yes  | -1,749.41 %
Original | 161.400 ms | 145.664 ms | 19.37 | 544,703,761 | Yes  | -3,990.40 %


Scale : 1,000,000
Name     |      Average |      Fastest |  StDv |        Cycles | Pass |        Gain
------------------------------------------------------------------------------------
Mine3    |    41.036 ms |    38.928 ms |  2.47 |   139,045,736 | Base |      0.00 %
Mine2    |    71.283 ms |    68.777 ms |  2.49 |   241,525,269 | Yes  |    -73.71 %
Mine     |   749.250 ms |   720.809 ms | 27.79 | 2,479,171,863 | Yes  | -1,725.84 %
Original | 1,517.240 ms | 1,477.321 ms | 48.94 | 5,142,422,700 | No   | -3,597.35 %

摘要

字符串分配,连接和不同的性能。如果您需要更高的性能,可能会将列表分解为工作负载并将其并行粉碎

答案 4 :(得分:0)

当然不是一个有效的解决方案,但这会奏效。

public class Program
{
    public static int arrLength = 0;
    public static string[] arr;
    public static Dictionary<char, int> dct = new Dictionary<char, int>();

    public static void Main(string[] args)
    {
        dct.Add('0', 0);
        dct.Add('1', 0);
        dct.Add('2', 0);
        dct.Add('3', 0);
        dct.Add('4', 0);
        dct.Add('5', 0);
        dct.Add('6', 0);
        dct.Add('7', 0);
        dct.Add('8', 0);
        dct.Add('9', 0);

        arr = Console.ReadLine().Split(' ');
        arrLength = arr.Length;
        foreach (string str in arr)
        {
            char[] ch = str.ToCharArray();
            ch = ch.Distinct<char>().ToArray();
            foreach (char c in ch)
            {
                Exists(c, Array.IndexOf(arr, str));
            }
        }

        int val = dct.Values.Max();
        foreach(KeyValuePair<char,int> v in dct.Where(x => x.Value == val))
        {
            Console.WriteLine("Common digit {0} with frequency {1} ",v.Key,v.Value+1);
        }
        Console.ReadLine();
    }

    public static bool Exists(char c, int pos)
    {
        int count = 0;
        if (pos == arrLength - 1)
            return false;

        for (int i = pos; i < arrLength - 1; i++)
        {
            if (arr[i + 1].ToCharArray().Contains(c))
            {
                count++;
                if (count > dct[c])
                    dct[c] = count;
            }
            else
                break;
        }
        return true;
    }
}

答案 5 :(得分:0)

As somebody else pointed out, if you have 10^18 numbers then this is going to be a lot more data than you can fit into memory. So you need a streaming solution. You also don't want to spend a lot of time on memory allocation or converting strings to character arrays, calling functions to de-duplicate digits, etc. Ideally, you need a solution that looks at each character once.

The memory requirement of the program below is very small: just two small arrays of long integers.

The algorithm I developed maintains two arrays of counts per digit. One is the maximum number of consecutive occurrences of a digit, and the other is the most recent count of consecutive occurrences.

The code itself reads the file character-by-character, accumulating digits until it encounters a character that is not a digit, then it updates the current counts array for each digit encountered. If the current count exceeds the maximum count, then the max count for that digit is updated. If a digit doesn't appear in a number, then its current count is reset to 0.

The occurrence of individual digits in a number is maintained by setting bits in the digits variable. That way, a number like 1221 will not count the digits twice.

using (var input = File.OpenText("filename"))
{
    var maxCounts = new long[]{0,0,0,0,0,0,0,0,0,0};
    var latestCounts = new long[]{0,0,0,0,0,0,0,0,0,0};
    char prevChar = ' ';

    word digits = 0;
    while (!input.EndOfStream)
    {
        var c = input.Read();

        // If the character is a digit, set the corresponding bit
        if (char.IsDigit(c))
        {
            digits |= (1 << (c-'0'));
            prevChar = c;
            continue;
        }

        // test here to prevent resetting counts when there are multiple non-digit
        // characters between numbers.
        if (!char.IsDigit(prevChar))
        {
            continue;
        }
        prevChar = c;

        // digits has a bit set for every digit
        // that occurred in the number.
        // Update the counts arrays.

        // For each of the first 10 bits, update the corresponding count.
        for (int i = 0; i < 10; ++i)
        {
            if ((digits & 1) == 1)
            {
                ++latestCounts[i];
                if (latestCounts[i] > maxCounts[i])
                {
                    maxCounts[i] = latestCounts[i];
                }
            }
            else
            {
                latestCounts[i] = 0;
            }
            // Shift the next bit into place.
            digits >> 1;
        }
        digits = 0;
    }
}

This code minimizes the processing required, but the program's running time will be dominated by the speed at which you can read the file. There are optimizations you can make to increase the input speed, but ultimately you're limited to your system's data transfer speed.