Question

您有一个数组大小n和一个常量 k（无论如何）

您可以假设该数组是int类型（尽管它可以是任何类型）

描述一种算法，该算法可以查找是否存在至少n/k次重复自身的元素...如果有返回的元素。在线性时间（O(n)）

中这样做

捕获：使用常量内存执行此算法（甚至是伪代码）并仅在阵列上运行两次

Answer 1

我不是百分百肯定，但听起来你想要解决the Britney Spears problem - 使用常量内存找到构成样本某一部分的项目。

以下是英文问题的陈述，并附有解决方案草图：

...来自Erik的2002年文章 D.麻省理工学院和亚历杭德罗的Demaine López-Ortiz和J. Ian Munro的加拿大滑铁卢大学。德梅因和他的同事们扩展算法覆盖一个更一般的问题：给出一个流长度为n的，确定一组大小为m 包括所有元素发生频率更高比n /（m +1）。（在m = 1的情况下，这减少了大多数问题。）扩展算法需要m 注册候选元素以及米柜台。基础的操作方案类似于多数算法的那个。当一个 stream元素匹配其中一个候选人，相应的柜台递增;什么时候没有比赛对任何候选人，所有的柜台递减;如果计数器为0，相关的候选人被替换通过流中的新元素。

Answer 2

创建一个大小（k-1）的临时数组来存储元素及其计数（输出元素将在这些k-1元素中）。
遍历输入数组并更新每个遍历元素的temp []（添加/删除元素或增加/减少计数）。数组temp []在每一步都存储潜在的（k-1）候选者。此步骤需要O（nk）时间。
迭代最终（k-1）个潜在候选人（存储在temp []中）。或每个元素，检查它是否实际计数超过n / k。此步骤需要O（nk）时间。

主要步骤是第2步，如何在每个点维持（k-1）潜在候选人？步骤2中使用的步骤就像着名的游戏：俄罗斯方块。我们将每个数字视为俄罗斯方块中的一个部分，它在我们的临时数组temp []中落下。我们的任务是尝试将相同的数字堆叠在同一列上（临时数组中的计数递增）。

Consider k = 4, n = 9 
Given array: 3 1 2 2 2 1 4 3 3 

i = 0
         3 _ _
temp[] has one element, 3 with count 1

i = 1
         3 1 _
temp[] has two elements, 3 and 1 with 
counts 1 and 1 respectively

i = 2
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 1 respectively.

i = 3
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 4
         - - 2 
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 3 respectively.

i = 5
         - - 2 
         - 1 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 2 and 3 respectively. 
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6
         - - 2 
         - 1 2 
temp[] has two elements, 1 and 2 with
counts as 1 and 2 respectively.

i = 7
           - 2 
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 8          
         3 - 2
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 2, 1 and 2 respectively.

最后，我们在temp []中最多有k-1个数字。 temp中的元素是{3,1,2}。请注意，temp []中的计数现在没用，仅在步骤2中需要计数。现在我们需要检查temp []中元素的实际计数是否大于n / k（9/4）。元素3和2的计数超过9/4。所以我们打印3和2。

请注意，该算法不会遗漏任何输出元素。可能有两种可能性，许多事件在一起或分布在整个阵列中。如果出现在一起，那么count将很高并且不会变为0.如果发生了遍历，则元素将再次出现在temp []中。以下是上述算法的C ++实现。

// A C++ program to print elements with count more than n/k
#include<iostream>
using namespace std;

// A structure to store an element and its current count
struct eleCount
{
    int e;  // Element
    int c;  // Count
};

// Prints elements with more than n/k occurrences in arr[] of
// size n. If there are no such elements, then it prints nothing.
void moreThanNdK(int arr[], int n, int k)
{
    // k must be greater than 1 to get some output
    if (k < 2)
       return;

    /* Step 1: Create a temporary array (contains element
       and count) of size k-1. Initialize count of all
       elements as 0 */
    struct eleCount temp[k-1];
    for (int i=0; i<k-1; i++)
        temp[i].c = 0;

    /* Step 2: Process all elements of input array */
    for (int i = 0; i < n; i++)
    {
        int j;

        /* If arr[i] is already present in
           the element count array, then increment its count */
        for (j=0; j<k-1; j++)
        {
            if (temp[j].e == arr[i])
            {
                 temp[j].c += 1;
                 break;
            }
        }

        /* If arr[i] is not present in temp[] */
        if (j == k-1)
        {
            int l;

            /* If there is position available in temp[], then place 
              arr[i] in the first available position and set count as 1*/
            for (l=0; l<k-1; l++)
            {
                if (temp[l].c == 0)
                {
                    temp[l].e = arr[i];
                    temp[l].c = 1;
                    break;
                }
            }

            /* If all the position in the temp[] are filled, then 
               decrease count of every element by 1 */
            if (l == k-1)
                for (l=0; l<k; l++)
                    temp[l].c -= 1;
        }
    }

    /*Step 3: Check actual counts of potential candidates in temp[]*/
    for (int i=0; i<k-1; i++)
    {
        // Calculate actual count of elements 
        int ac = 0;  // actual count
        for (int j=0; j<n; j++)
            if (arr[j] == temp[i].e)
                ac++;

        // If actual count is more than n/k, then print it
        if (ac > n/k)
           cout << "Number:" << temp[i].e
                << " Count:" << ac << endl;
    }
}

/* Driver program to test above function */
int main()
{
    cout << "First Test\n";
    int arr1[] = {4, 5, 6, 7, 8, 4, 4};
    int size = sizeof(arr1)/sizeof(arr1[0]);
    int k = 3;
    moreThanNdK(arr1, size, k);

    cout << "\nSecond Test\n";
    int arr2[] = {4, 2, 2, 7};
    size = sizeof(arr2)/sizeof(arr2[0]);
    k = 3;
    moreThanNdK(arr2, size, k);

    cout << "\nThird Test\n";
    int arr3[] = {2, 7, 2};
    size = sizeof(arr3)/sizeof(arr3[0]);
    k = 2;
    moreThanNdK(arr3, size, k);

    cout << "\nFourth Test\n";
    int arr4[] = {2, 3, 3, 2};
    size = sizeof(arr4)/sizeof(arr4[0]);
    k = 3;
    moreThanNdK(arr4, size, k);

    return 0;
}

Answer 3

在O（n）

中有两种常见的（理论上）解决这个问题的方法

I）第一个想法是最简单的

步骤1）虽然有超过k个不同的元素，但选择k个不同的元素并将它们全部删除。

步骤2）测试所有k个不同的剩余元素的频率

正确性证明：请注意，虽然步骤最多将执行n / k - 1次。假设有一个元素至少重复n / k次。在最坏的情况下，它可以在所有n / k-1迭代中被选择，并且它仍然在它之后的最终数组中，在被测试之后它将被发现。

实现：步骤1可以实现保持大小为k-1（常量）的关联数组（将键映射到值），在数组上从左向右扫描，如果找到已经在地图上的元素，则增加它的计数器为1，如果元素不在地图上并且地图尚未填满（小于k-1个元素），则添加此新元素并使用初始计数1，如果地图已满，则从每个计数器中删除1 element，如果任何元素达到0，则从地图中删除它。最后，此地图上的元素将与您需要测试的其余元素等效。如果在最后一次迭代中你的地图变空了，你需要在擦除之前测试所有元素，以涵盖频率正好是n / k的情况。

复杂性：考虑到这个映射的最差方法，O（n * k）= O（n），因为k是连续的。

步骤2可以通过计算所有（最大）k-1个元素的频率来实现复杂度：O（k * n）= O（n）

总体复杂性：O（n）+ O（n）= O（n）。（有一个与实现不同的小细节，1个元素的差异，这是因为我们还希望覆盖伪代码中频率n / k重复的情况，如果不是，我们可以再允许一次迭代有正好k个不同的元素，不一定超过k）

II）第二种算法使用线性时间http://en.wikipedia.org/wiki/Selection_algorithm中的选择算法和也以线性时间运行的分区算法。使用它们，你可以在k-1桶中打破你的数组，不变量是第i个桶中的任何元素都小于或等于第j个桶中的任何元素j> 1。我在O（n）。但请注意，元素未在每个桶中排序。

现在，您使用每个桶具有n /（k-1）个元素的事实，并且您正在寻找至少（n / k）和（n / k）＆gt;重复自身的元素。 N /（2 *（K-1））。这足以使用多数定理，该定理指出如果元素是多数（比元素数除以2更频繁），那么它也是数组的中值。您可以使用选择算法再次获得中位数。

所以，你只需测试分区的所有中位数和所有枢轴，你需要测试它们，因为它们可能在两个不同的桶中分割相等的值，有k-1 + k值，复杂度为O（（2 * k） -1）* n））= O（n）。

Answer 4

一个简单的O（n）算法是将散列映射从找到的数字到找到的实例数保持不变。使用散列映射对于维护O（n）非常重要。最后一次通过地图将揭示答案。这个过程也是O（n），因为最坏的情况是每个元素只出现一次，因此地图与原始数组的大小相同。

Answer 5

我不知道您是否可以使用哪些额外的数据结构。

如何使用'elements'＆lt; - ＆gt;创建一个hashmap呢？计数映射。插入是O（log N）。查找是O（1）。对于每个元素，在哈希表上查找，如果在计数1中不存在则插入。如果存在，则检查count＆lt; （N / K）。它会留在O（n）。

编辑：

我忘记了恒定的内存限制。它是否预先分配了允许N个元素的哈希映射条目？

Answer 6

这是我上面描述的Jerky算法的实现：

#include <map>
#include <vector>
#include <iostream>
#include <algorithm>

std::vector<int> repeatingElements(const std::vector<int>& a, int k)
{
    if (a.empty())
        return std::vector<int>();

    std::map<int, int> candidateMap; //value, count;

    for (int i = 0; i < a.size(); i++)
    {
        if (candidateMap.find(a[i]) != candidateMap.end())
        {
            candidateMap[a[i]]++;
        }
        else
        {
            if (candidateMap.size() < k-1)
            {
                candidateMap[a[i]] = 1;
            }
            else
            {
                for (std::map<int, int>::iterator iter = candidateMap.begin();
                     iter != candidateMap.end();)
                {
                    (iter->second)--;

                    if (iter->second == 0)
                    {
                        iter = candidateMap.erase(iter);
                    }
                    else
                    {
                        iter++;
                    }
                }   
            }
        }
    }

    std::vector<int> ret;

    for (std::map<int, int>::iterator iter = candidateMap.begin();
         iter != candidateMap.end(); iter++)
    {
        int candidate = iter->first;

        if (std::count(a.begin(), a.end(), candidate) > (a.size() / k))
        {
            ret.push_back(candidate);
        }
    }

    return ret;
}

int main()
{
    std::vector<int> a = { 1, 1, 4, 2, 2, 3, 3 };   
    int k = 4;

    std::vector<int> repeating_elements = repeatingElements(a, k);

    for (int elem : repeating_elements)
    {
        std::cout << "Repeating more than n/" << k << " : " << elem << std::endl;
    }

    return 0;
}

输出是：

重复超过n / 4：1

重复超过n / 4：2

重复超过n / 4：3

查找是否有一个元素重复n / k次

6 个答案: