Handling duplicate values while merging k sorted arrays

时间:2015-11-12 10:56:25

标签: c++ algorithm vector heap priority-queue

I am trying to merge k sorted array of structs into a single one. I know the algorithm of using a min heap to merge the arrays. I am using priority_queue in C++ to implement the heap. My code looks like below.

struct Num {
    int key;
    int val;
}

// Struct used in priority queue.
struct HeapNode
{
    Num num;              // Holds one element.
    int vecNum;           //Array number from which the element is fetched.
    int vecSize;          // Holds the size of the array.
    int next;             // Holds the index of the next element to fetch.
};

// Struct used to compare nodes in a priority queue.
struct CompareHeapNode  
{  
    bool operator()(const HeapNode& x, const HeapNode& y)  
    {  
        return (x.num.key < y.num.key) || ( (x.num.key == y.num.key)&&(x.num.val < y.num.val) ); 
    } 
}; 

vector<vector<Num>> v;
priority_queue< HeapNode, vector<HeapNode>, CompareHeapNode> p_queue;

//Insert the first element of the individual arrays into the heap.

while(!p_queue.empty())  
{  
    Num x = p_queue.top();
    cout << x.num.key << ' ' << x.num.val << '\n';
    p_queue.pop();

    if(x.next != x.vecSize) {
        HeapNode hq = {v[x.vecNum][x.next], x.vecNum, x.vecSize, ++x.next};
        p_queue.push(hq);
    }  
}

Let's consider 3 sorted arrays as shown below.

Array1:             Array2:         Array3:
0 1                 0 10            0 0
1 2                 2 22            1 2
2 4                 3 46            2 819
3 7                 4 71            3 7321

Now the problem is there can be some elements common among the arrays as show above. So while merging the arrays, duplicate values appear in the sorted array. Are there any ways to handle duplicate keys?

1 个答案:

答案 0 :(得分:0)

So your question is that is there a way to check if the value you were inserting into the list were already in the list. Only if you could check that.

One solution is to use a hash table (unordered_set). Before inserting, check if element exists in it. If not, then insert that element in list and hash table.

But you can do better. Since you are merging sorted arrays, the output is also sorted. So, if duplicates exists, they will be together in the output array. So, before inserting, check the value with the last value of the output.