从相同长度的两个给定排序数组中找到中位数的问题是众所周知且容易的(之前曾多次询问)。 (这可以通过简单的递归算法完成)
我的问题是如何在两个数组长度不同的情况下找到中位数(即不使用mergesort对它们进行排序并找到中位数)
另外,如何找到相同长度的 k 排序数组的中位数?有一种有效的算法吗?
我试着回答最后一个问题,但没有找到一个好的解决方案, 谢谢!
答案 0 :(得分:0)
如果从其中一个数组中选择一个值并在另一个数组中对其进行二进制搜索,那么您将知道每个数组中有多少值高于和低于所选值,这足以告诉您有多少两者组合中的值高于和低于所选值。
所以你可以在第一个数组上做一个二进制斩波,找出它的哪个值最接近整体中位数,你可以在第二个数组上做一个二进制斩波,找出它的哪个值最接近总体中位数,这两个数组中的一个必须包含整体中位数。
在最坏的情况下,这是两个外部二进制斩波的成本,其中每个猜测花费你一个内部二进制斩,所以O(log ^ 2(n))。
有一些想法至少可以提供一个实际的加速:
1)在进行内部二进制切割时,您不一定需要找到完全匹配。一旦减少了匹配将足以判断所选值是高于还是低于目标中位数的值的间隔,您就可以返回该范围内的任何值。
2)您可以查看从内部二进制斩波的前一次调用返回的间隔是否是当前调用的可行起点。如果它没有包含搜索到的值,可能是一个相同大小的区间到一侧或另一侧的区间。
答案 1 :(得分:0)
您可以找到m
时间中两个不同长度的排序数组n
和O(log2(min(m+n)))
的并集的中值。本质上,您在每个数组中搜索一个拆分点,该拆分点的两个小拆分与两个大拆分所贡献的元素数量相同。这样可以确定中位数以上和以下的元素数量相等。
可以使用二进制搜索来搜索理想的分割点(排序方式可以确保您通过检查是否过高或过低来有效地进行搜索)。
在一个数组中找到分割点可以免费为您提供另一个数组的分割点(因为您知道要平衡从第一个数组中选择的元素需要多少个元素)。
一旦在每个数组中找到了给出“中位数以下的所有元素”和“中位数以上的所有元素”的分割点,就可以通过检查它们之间的边界来计算中位数(即,如果并集长度,则抓住中间元素是奇数,否则直接在边界上求平均。)
我根据this leetcode discussion的注释(zzg_zzm在stellari算法之上的调整)将Python算法翻译成JavaScript。但是我选择了更直观的变量名,并添加了注释。
未进行详尽的测试,但为我尝试的少量输入工作。
function findUnionMedianSorted(smallArr, largeArr) {
// there are an equal number of elements below and above median
// we need to find partitions on arr1 and arr2 such that arr1 and arr2
// together contribute an equal number of submedian and supermedian elements
// because fitness of partition point is transitive,
// we can use binary search to approach optimal partition
// we use the smaller array as a basis for finding the first partition,
// since this eliminates situation where small array lacks enough elements to balance the partition
// global median can then be calculated as:
// avg(elementBelowMedian, elementAboveMedian)
// so we must find also the elements that flank the median
// ensure smallArr is the smaller array
if (largeArr.length < smallArr.length) {
return findUnionMedianSorted(largeArr, smallArr)
}
const unionArrLen = smallArr.length + largeArr.length
// indices at which we would consider performing a cut
let smallArrCutStartIx = 0, smallArrCutEndIx = smallArr.length
while (smallArrCutStartIx <= smallArrCutEndIx) {
// cut we are evaluating
// midpoint of current search space of possible smallArr cuts
const smallArrCutIx = Math.floor((smallArrCutStartIx + smallArrCutEndIx)/2)
// partition on largeArr must provide same number of elements
// above median as smallArr provides below median
const largeArrCutIx = Math.floor(unionArrLen/2) - smallArrCutIx
// smallArr and largeArr both submit a candidate for "what may be the element preceding the median"
// this is the element preceding that array's cut
// if there is no such element: we are cutting at an end of the array, so we have no element to offer
// thus: we set extreme value such that comparisons favor the alternative (candidate from other array)
const smallArrElementBeforeMedian = smallArrCutIx === 0
? Number.MIN_SAFE_INTEGER
: smallArr[smallArrCutIx-1]
const smallArrElementAfterMedian = smallArrCutIx === smallArr.length
? Number.MAX_SAFE_INTEGER
: smallArr[smallArrCutIx]
const largeArrElementBeforeMedian = largeArrCutIx === 0
? Number.MIN_SAFE_INTEGER
: largeArr[largeArrCutIx-1]
const largeArrElementAfterMedian = largeArrCutIx === largeArr.length
? Number.MAX_SAFE_INTEGER
: largeArr[largeArrCutIx]
// elements before median must be smaller than elements after median
// this is already guaranteed within-array (elements are sorted)
// but we check whether our proposed cut violates this across the two proposed arrays
if (smallArrElementBeforeMedian > largeArrElementAfterMedian) {
// our cut on smallArr is at too high an index
// eliminate all cut locations equal to or greater than the cut index we tried
smallArrCutEndIx = smallArrCutIx-1
continue
}
if (smallArrElementAfterMedian < largeArrElementBeforeMedian) {
// our cut on smallArr is at too low an index
// eliminate all cut locations equal to or less than the cut index we tried
smallArrCutStartIx = smallArrCutIx+1
continue
}
// both candidates will be present in the union array,
// but only the smaller one will be directly after the median
const elementAfterMedian = Math.min(smallArrElementAfterMedian, largeArrElementAfterMedian)
// does the union array have one middle or two?
if (unionArrLen %2 === 1) {
// odd length; one middle
// why do we prefer `elementAfterMedian` and not `elementBeforeMedian`?
// the material I adapted this from did not explain, so what follows is my (shaky) guess:
// our "after" index points to the midpoint of a search space, so for odd-length arrays
// it is actually an "equal to" index.
return elementAfterMedian
}
// both candidates will be present in the union array,
// but only the larger one will be directly before the median
const elementBeforeMedian = Math.max(smallArrElementBeforeMedian, largeArrElementBeforeMedian)
// average the two middles
return (elementBeforeMedian + elementAfterMedian) / 2
}
}
至:
另外,如何找到相同长度的k个排序数组的中位数呢?有没有有效的算法?
足够大,值得提出一个单独的问题。