Question

假设我有一个Int列表，其中元素已知是有界的，并且已知列表不超过其范围，因此完全有可能不包含重复元素。我如何最快地测试是否是这种情况？

我知道nubOrd。是quite fast。我们可以通过列表，看看它是否变短了。但是nubOrd的效率仍然不是线性的。

我的想法是我们可以交换空间以节省时间。势必要，我们将分配一个与我们的范围一样宽的位字段，然后遍历列表，标记与列表元素的值相对应的条目。一旦我们尝试翻转已经为1的位，我们将返回False。它只需要（读+比较+写）*列表的长度。没有二进制搜索树，什么也没有。

在Haskell中尝试类似的构造是否合理？

Answer 1

您可以使用discrimination程序包中的线性时间nub。或线性时间group，不需要将等价的元素相邻即可对其进行分组，因此您可以查看是否有任何组的大小都不为1。

整个程序包基于使用基于“区分”的算法而不是基于比较的算法，绕开了基于比较的排序（和联接等）上众所周知的界限。据我了解，该技术有点像radix sort，但可以推广到ADT。

Answer 2

因此，您有一个大小为N的列表，并且您知道列表中的元素在min .. min+N-1范围内。

有一个简单的线性时间算法需要O（1）空间。

首先，扫描列表以查找最小和最大元素。

如果(max - min + 1) < N，则表示重复。否则...

因为范围是N，所以最小项可以为a[0]，最大项可以为a[n-1]。您只需将min减去就可以将任何项目映射到其在数组中的位置。您可以在O（n）中进行就地排序，因为您确切知道每个项目应该去哪里。

从列表的开头开始，采用第一个元素，然后减去min以确定应该去哪里。转到该位置，并替换那里的项目。使用新商品，计算出它应该去的地方，并在该位置替换该商品，等等。

如果您要到达某个位置，则尝试将项目放置在a[x]上，并且该值已经存在，并且该值已经存在（即a[x] == x+min），那么您找到了重复项。

执行所有这些操作的代码非常简单：

更正的代码。

min, max = findMinMax()
currentIndex = 0
while currentIndex < N
    temp = a[currentIndex]
    targetIndex = temp - min;
    // Do this until we wrap around to the current index
    // If the item is already in place, then targetIndex == currentIndex,
    // and we won't enter the loop.
    while targetIndex != currentIndex
        if (a[targetIndex] == temp)
            // the item at a[targetIndex] is the item that's supposed to be there.
            // The only way that can happen is if the item we have in temp is a duplicate.
            found a duplicate
        end if
        save = a[targetIndex]
        a[targetIndex] = temp
        temp = save
        targetIndex = temp - min
    end while
    // At this point, targetIndex == currentIndex.
    // We've wrapped around and need to place the last item.
    // There's no need to check here if a[targetIndex] == temp, because if it did,
    // we would not have entered the loop.
    a[targetIndex] = temp
    ++currentIndex
end while

这是基本思想。

Answer 3

对于整数（和其他Ix-like types），您可以使用可变数组，例如与array package一起使用。

我们可以在此处使用STUArray，例如：

import Control.Monad.ST
import Data.Array.ST

updateDups_ :: [Int] -> STArray s Int Bool -> ST s Bool
updateDups_ [] _ = return False
updateDups_ (x:xs) arr = do
    contains <- readArray arr x
    if contains then return True
    else writeArray arr x True >> updateDups_ xs arr

withDups_ :: Int -> [Int] -> ST s Bool
withDups_ mx l = newArray (0, mx) False >>= updateDups_ l

withDups :: Int -> [Int] -> Bool
withDups mx ls = runST (withDups_ mx ls)

例如：

Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,5]
False
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,1]
True
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,16,2]
True

因此，这里的第一个参数是可以添加到列表中的最大值，第二个参数是我们要检查的值列表。

我可以检查在线性时间内有界列表是否包含重复项吗？

3 个答案: