Question

我需要一个帮助来制作一个解决一个问题的算法：有一行数字在行中出现不同的时间，我需要找到最多出现的数字和行数，例如：

1-1-5-1-3-7-2-1-8-9-1-2

那将是1，它出现5次。

算法应该很快（这是我的问题）。有什么想法吗？

Answer 1

您正在寻找的是mode。您可以对数组进行排序，然后查找最长的重复序列。

Answer 2

您可以保留哈希表并存储该结构中每个元素的计数，例如

h[1] = 5
h[5] = 1
...

Answer 3

你不能比线性时间更快地得到它，因为你需要至少查看一次这个数字。

如果您知道这些数字在某个范围内，您可以使用一个额外的数组来总结每个数字的出现次数，否则您需要一个稍微慢一点的哈希表。

这两个都需要额外的空间，你需要在最后再次循环计数才能得到结果。

除非你真的拥有大量数字并且绝对需要O（n）运行时，否则你可以简单地对数组进行排序。然后，您可以遍历数字，只需将当前数字和数字的计数与两个变量中出现的最大值保持一致。所以你节省了很多空间，用一点时间来换掉它。

Answer 4

有一种算法可以在线性时间内解决您的问题（输入中的项目数是线性的）。我们的想法是使用哈希表将输入中的每个值关联一个计数，指示该值已被看到的次数。您必须根据预期的输入进行分析，看看这是否符合您的需求。

请注意，这会占用O(n)个额外空间。如果这是不可接受的，您可能需要考虑按其他人的建议对输入进行排序。该解决方案的时间为O(n log n)，空间为O(1)。

以下是使用std::tr1::unordered_map的C ++实现：

#include <iostream>
#include <unordered_map>

using namespace std;
using namespace std::tr1;

typedef std::tr1::unordered_map<int, int> map;

int main() {
    map m;

    int a[12] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
    for(int i = 0; i < 12; i++) {
        int key = a[i];
        map::iterator it = m.find(key);
        if(it == m.end()) {
            m.insert(map::value_type(key, 1));
        }
        else {
            it->second++;
        }
    }
    int count = 0;
    int value;
    for(map::iterator it = m.begin(); it != m.end(); it++) {
        if(it->second > count) {
            count = it->second;
            value = it->first;
        }
    }

    cout << "Value: " << value << endl;
    cout << "Count: " << count << endl;
}

该算法使用输入整数作为哈希表中的键来计算每个整数出现的次数。因此，算法的关键（双关语）正在构建此哈希表：

int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
    m.insert(map::value_type(key, 1));
}
else {
    it->second++;
}

所以我们在这里看一下输入列表中的i元素。然后我们要做的是看看我们是否已经看过它。如果我们没有，我们在包含这个新整数的哈希表中添加一个新值，并且初始计数为1表示这是我们第一次看到它。否则，我们递增与此值相关联的计数器。

一旦我们构建了这个表，只需要运行这些值来找到最能显示的值：

int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
    if(it->second > count) {
        count = it->second;
        value = it->first;
    }
}

目前没有逻辑来处理两个不同值出现相同次数的情况，并且该次数在所有值中是最大的。您可以根据自己的需要自行处理。

Answer 5

这是一个简单的，即O（n log n）：

Sort the vector @ O(n log n)
Create vars: int MOST, VAL, CURRENT
for ELEMENT in LIST:
    CURRENT += 1
    if CURRENT >= MOST:
        MOST = CURRENT
        VAL = ELEMENT
return (VAL, MOST)

Answer 6

方法很少：

通用方法是“排序并找到最长的子序列”，即O(nlog n)。最快的排序算法是快速排序（平均值，最差的是O( n^2 )）。你也可以使用heapsort，但在平均情况下速度相当慢，但在最坏的情况下，渐近复杂度也是O( n log n )。

如果您有关于数字的一些信息，那么您可以使用一些技巧。如果数字来自有限范围，那么您可以使用部分算法来计算排序。它是O( n )。

如果不是这种情况，有一些其他排序算法可以在线性时间内完成，但没有一个是通用的。

Answer 7

编辑： 取出未使用的比较功能。

这是Python 3.1 impl：

#Python 3.1
lst = [1,1,5,1,3,7,2,1,8,9,1,2]

dct = {}
for i in lst:
    if i in dct:
        dct[i] += 1
    else:
        dct[i] = 1

mx = max(dct.keys(), key=lambda k: dct[k])

print('Value {0} appears {1} times.'.format(mx, dct[mx]))

>>> 
Value 1 appears 5 times.

Answer 8

通用C ++解决方案：

#include <algorithm>
#include <iterator>
#include <map>
#include <utility>

template<class T, class U>
struct less_second
{
    bool operator()(const std::pair<T, U>& x, const std::pair<T, U>& y)
    {
        return x.second < y.second;
    }
};

template<class Iterator>
std::pair<typename std::iterator_traits<Iterator>::value_type, int>
most_frequent(Iterator begin, Iterator end)
{
    typedef typename std::iterator_traits<Iterator>::value_type vt;
    std::map<vt, int> frequency;
    for (; begin != end; ++begin) ++frequency[*begin];
    return *std::max_element(frequency.begin(), frequency.end(),
                             less_second<vt, int>());
}

#include <iostream>

int main()
{
    int array[] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
    std::pair<int, int> result = most_frequent(array, array + 12);
    std::cout << result.first << " appears " << result.second << " times.\n";
}

Haskell解决方案：

import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)

count = foldl step Map.empty where
    step frequency x = Map.alter next x frequency
    next  Nothing    = Just 1
    next (Just n)    = Just (n+1)

most_frequent = maximumBy (compare `on` snd) . Map.toList . count

example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]

更短的Haskell解决方案，在堆栈溢出的帮助下：

import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)

most_frequent = maximumBy (compare `on` snd) . Map.toList .
                Map.fromListWith (+) . flip zip (repeat 1)

example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]

Answer 9

你可以得到的最佳时间复杂度是O（n）。您必须查看所有元素，因为最后一个元素可能是确定模式的元素。

解决方案取决于时间或空间是否更重要。

如果空间更重要，那么您可以对列表进行排序，然后找到连续元素的最长序列。

如果时间更重要，您可以遍历列表，保持每个元素出现次数的计数（例如散列元素 - >计数）。执行此操作时，请跟踪具有最大计数的元素，必要时进行切换。

如果你也知道模式是多数元素（即数组中有超过n / 2个元素的值，那么你可以获得O(n) speed and O(1) space efficiency。

Answer 10

Python 2.6

>>> from collections import defaultdict
>>> lst = [1,1,5,1,3,7,2,1,8,9,1,2]
>>> d = defaultdict(int)
>>> for i in lst:
...     d[i] += 1
...
>>> max_occurring = max((v, k) for k, v in d.items())
>>> print "%d occurs %d times" % (max_occurring[1], max_occurring[0])
1 occurs 5 times

Answer 11

下面的解决方案为您提供每个号码的计数。在时间和空间方面比使用地图更好。如果您需要获得出现次数最多的数字，那么这并不比以前更好。

编辑：此方法仅对无符号数字和从1开始的数字有用。

    std::string row = "1,1,5,1,3,7,2,1,8,9,1,2";
    const unsigned size = row.size();
    int* arr = new int[size];
    memset(arr, 0, size*sizeof(int));
    for (int i = 0; i < size; i++)
    {
        if (row[i] != ',')
        {
            int val = row[i] - '0';
            arr[val - 1]++;
        }
    }

    for (int i = 0; i < size; i++)
        std::cout << i + 1 << "-->" << arr[i] << std::endl;

Answer 12

由于这是作业，我认为可以用不同的语言提供解决方案。

在Smalltalk中，以下内容将是一个很好的起点：

SequenceableCollection>>mode
  | aBag maxCount mode |

  aBag := Bag new
            addAll: self;
            yourself.
  aBag valuesAndCountsDo: [ :val :count |
    (maxCount isNil or: [ count > maxCount ])
      ifTrue: [ mode := val.
                maxCount := count ]].

  ^mode

用于查找连续出现的数字的算法 - C ++

12 个答案: