使用O(1)辅助空间以相同顺序查找数组中k个最小数的算法

时间:2018-07-10 12:48:43

标签: arrays algorithm sorting

例如,如果数组为 arr [] = {4,2,6,1,5} ,             和 k = 3 ,则输出应为 4 2 1

5 个答案:

答案 0 :(得分:3)

可以在O(nk)个步骤和O(1)空间中完成。

首先,以kn的步骤找到第k个最小的数字:找到最小的数字;将其存储在局部变量min中;然后找到第二个最小数字,即大于min的最小数字;将其存储在min中;依此类推...重复从i = 1k的过程(每次都是在数组中进行线性搜索)。

具有此值,浏览数组并打印所有小于或等于min的元素。最后一步是线性的。

如果数组中有重复的值,则必须小心。在这种情况下,如果在一遍中发现重复的i值,则我们必须多次递增min。另外,除了min变量外,我们还必须有一个count变量,该变量在主循环的每次迭代中都重置为零,并且每次发现重复的min编号时都会递增。

在遍历数组的最终扫描中,我们将打印所有小于min的值,并且最多打印count个正好min的值。

C语言中的算法会这样:

int min = MIN_VALUE, local_min;
int count;
int i, j;

i = 0;
while (i < k) {
  local_min = MAX_VALUE;
  count = 0;
  for (j = 0; j < n; j++) {
    if ((arr[j] > min || min == MIN_VALUE) && arr[j] < local_min) {
      local_min = arr[j];
      count = 1;
    }
    else if ((arr[j] > min || min == MIN_VALUE) && arr[j] == local_min) {
      count++;
    }
  }
  min = local_min;
  i += count;
}

if (i > k) {
  count = count - (i - k);
}

for (i = 0, j = 0; i < n; i++) {
  if (arr[i] < min) {
    print arr[i];
  }
  else if (arr[i] == min && j < count) {
    print arr[i];
    j++;
  }
}

其中MIN_VALUEMAX_VALUE可以是任意值,例如-infinity+infinity,或者MIN_VALUE = arr[0]MAX_VALUE设置为arr中的最大值(最大值可以在其他初始循环中找到)。

答案 1 :(得分:1)

单次通过解决方案-O(k) space(有关O(1) space的信息,请参见下文)。

项目的顺序得以保留(即稳定)。

// Pseudo code

if ( arr.size <= k )
    handle special case

array results[k]
int i = 0;

// init
for ( ; i < k, i++) {   // or use memcpy()
    results[i] = arr[i]
}

int max_val = max of results

for( ; i < arr.size; i++) {

    if( arr[i] < max_val ) {
        remove largest in results    // move the remaining up / memmove()
        add arr[i] at end of results // i.e. results[k-1] = arr[i]
        max_val = new max of results
    }
}

// for larger k you'd want some optimization to get the new max
// and maybe keep track of the position of max_val in the results array

示例:

4 6 2 3 1 5

4 6 2   // init
4 2 3   // remove 6, add 3 at end
2 3 1   // remove 4, add 1 at end

// or the original:

4 2 6 1 5

4 2 6   // init
4 2 1   // remove 6, add 1 -- if max is last, just replace

优化:

如果允许一些额外的字节,则可以针对较大的k进行优化:

create an array size k of objects {value, position_in_list}

keep the items sorted on value:
    new value: drop last element, insert the new at the right location
    new max is the last element

sort the end result on position_in_list

for really large k use binary search to locate the insertion point

O(1)空间

如果允许我们覆盖数据,则可以使用相同的算法,但是除了使用单独的array[k]之外,还可以使用列表的前k个元素(并且您可以跳过init

如果必须保留数据,请查看我的second answer,在大型kO(1) space上表现良好。

答案 2 :(得分:0)

基线(k=3的复杂度最高为3n-2):

  • 从列表末尾找到最小M1 及其位置P1(将其存储在out [2]中)

  • 从P1重做以在P2处找到M2(将其存储在out [1]中)

  • 从P2重做以找到M3(将其存储在out [0]中)

无疑可以改进。

答案 3 :(得分:0)

首先找到数组中第K个最小的数字。

https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-2-expected-linear-time/

上面的链接显示了如何使用随机快速选择来在O(n)时间的average complexity中找到第k个最小的元素。

一旦您拥有第K个最小元素,就循环遍历数组并打印所有等于或小于第K个最小数字的元素。

    int small={Kth smallest number in the array}
    for(int i=0;i<array.length;i++){
         if(array[i]<=small){
            System.out.println(array[i]+ " ");  
       }

}

答案 4 :(得分:0)

使用O(1) space和大型k(例如100,000)的解决方案,仅需通过列表的几步。

在我的first answer中,我介绍了使用O(k) space的单程解决方案,如果允许我们覆盖数据,则可以选择单程O(1) space

对于无法覆盖的数据,ciamej提供了O(1) solution,要求最多k的数据通过,这非常有用。

但是,对于大型列表(n)和大型k,我们可能需要更快的解决方案。例如,使用n=100,000,000(不同的值)和k=100,000,我们将不得不检查10万亿个项目,每个项目上都有一个分支+一个额外的通行证才能获得这些项目。

要减少n上的通过,我们可以创建一个小的范围直方图。这需要用于直方图的较小存储空间,但是由于O(1)表示常量空间(即不依赖于nk),我认为我们可以这样做。该空间可以和2 * uint32数组一样小。直方图的大小应为2的幂,这样我们就可以使用位掩码。

为使下面的示例简单易行,我们将使用包含16位正整数和直方图uint32[256]的列表-但它也可以与uint32[2]一起使用。

First, find the k-th smallest number - only 2 passes required:

uint32 hist[256];

First pass: group (count) by multiples of 256 - no branching besides the loop
    loop:
        hist[arr[i] & 0xff00 >> 8]++;

Now we have a count for each range and can calculate which bucket our k is in.
Save the total count up to that bucket and reset the histogram.

Second pass: fill the histogram again,
    now masking the lower 8 bits and only for the numbers belonging in that range.
    The range check can also be done with a mask

After this last pass, all values represented in the histogram are unique
    and we can easily calculate where our k-th number is.

If the count in that slot (which represents our max value after restoring
    with the previous mask) is higher than one, we'll have to remember that
    when printing out the numbers.
    This is explained in ciamej's post, so I won't repeat it here.

---

With hist[4] and a list of 32-bit integers we would need 8 passes.

The algorithm can easily be adjusted for signed integers.

示例:

k = 7

uint32_t hist[256];  // can be as small as hist[2]

uint16_t arr[]:

88
258
4
524
620
45
440
112
380
580
88
178

Fill histogram with:
    hist[arr[i] & 0xff00 >> 8]++;

hist         count
0 (0-255)      6
1 (256-511)    3 -> k
2 (512-767)    3
...

k is in hist[1] -> (256-511)

Clear histogram and fill with range (256-511):

Fill histogram with:
    if (arr[i] & 0xff00 == 0x0100)
        hist[arr[i] & 0xff]++;

Numbers in this range are:

258 & 0xff =   2
440 & 0xff = 184
380 & 0xff = 124

hist         count
0              0
1              0
2              1 -> k
...            0
124            1
...            0
184            1
...            0

k - 6 (first pass) = 1
k is in hist[2], which is 2 + 256 = 258

Loop through arr[] to display the numbers <= 258 in preserved order.

Take care of possible duplicate highest numbers (hist[2] > 1 in this case).
    we can easily calculate how many we have to print of those.

进一步优化:

如果我们可以期望k在较低的范围内,我们甚至可以通过使用log2值而不是固定范围来进一步优化:

There is a single CPU instruction to count the leading zero bits (or one bits)
    so we don't have to call a standard log() function
    but can call an intrinsic function instead.

This would require hist[65] for a list with 64-bit (positive) integers.

We would then have something like:

        hist[ 64 - n_leading_zero_bits ]++;

This way the ranges we have to use in the following passes would be smaller.