Question

今天在学校，老师要求我们实施重复删除算法。这并不困难，每个人都想出了以下解决方案（伪代码）：

for i from 1 to n - 1
    for j from i + 1 to n
        if v[i] == v[j] then remove(v, v[j])    // remove(from, what)
    next j
next i

此算法的计算复杂度为n(n-1)/2。（我们在高中，我们没有谈过大O，但似乎是O(n^2)）。这个解决方案看起来很丑陋，当然也很慢，所以我试着更快地编写代码：

procedure binarySearch(vector, element, *position)
    // this procedure searches for element in vector, returning
    // true if found, false otherwise. *position will contain the
    // element's place (where it is or where it should be)
end procedure

----

// same type as v
vS = new array[n]

for i from 1 to n - 1
    if binarySearch(vS, v[i], &p) = true then
        remove(v, v[i])
    else
        add(vS, v[i], p)      // adds v[i] in position p of array vS
    end if
next i

这种方式vS将包含我们已经传递的所有元素。如果元素v[i]在此数组中，则它是重复的并被删除。二进制搜索的计算复杂度为log(n)，主循环（第二个代码段）的计算复杂度为n。因此，如果我没有弄错的话，整个CC都是n*log(n)。

然后我对使用二叉树有了另一个想法，但我不能放下它基本上我的问题是：

我的CC计算是对的吗？（如果不是，为什么？）
有更快的方法吗？

由于

Answer 1

最简单的解决方案是简单地对数组进行排序（如果你可以使用标准实现，则采用O（n log n）。否则考虑制作一个简单的随机快速排序（代码甚至在维基百科上））。

然后再扫描一次。在该扫描期间，简单地消除连续的相同元素。

如果你想在O（n）中这样做，你也可以使用你已经看过的元素的HashSet。只需在您的数组上迭代一次，为每个元素检查它是否在您的HashSet中。

如果不在那里，请添加它。如果它在那里，将其从阵列中删除。

请注意，这将需要一些额外的内存，并且散列将具有一个有助于运行时的常量因子。虽然时间复杂度更好，但实际运行时只有在超过某个数组大小时才会更快

Answer 2

您经常可以使用space-time tradeoff并投入更多空间来缩短时间。

在这种情况下，您可以使用hash table来确定唯一字词。

Answer 3

add为O(n)，因此您的CC计算错误。您的算法为O(n^2)。

此外，remove将如何实施？它看起来也像O(n) - 所以初始算法是O(n^3)。

Answer 4

二进制搜索仅在您搜索的数组已排序时才有效。我想这不是这种情况，或者你不会在原始解决方案的内循环中循环整个数组。

Answer 5

如果最终解决方案的顺序无关紧要，您可以根据字符串的长度将数组拆分为较小的数组，然后从这些数组中删除重复项。例如：

// You have 
{"a", "ab", "b", "ab", "a", "c", "cd", "cd"}, 

// you break it into 
{"a", "b", "a", "c"} and {"ab", "ab", "cd", "cd"}, 

// remove duplicates from those arrays using the merge method that others have mentioned, 
// and then combine the arrays back together into 
{"a", "b", "c", "ab", "cd"}

Answer 6

这是最短的算法，其中arrNames和arrScores是并行数组，并且得分最高。

I := 0;
J := 0;
//iCount being the length of the array

for I := 1 to iCount do
for J := I + 1 to iCount do

   if arrNames[I] = arrNames[J] then
   begin

     if arrScores[I] <= arrScores[J] then
     arrScores[I] := arrScores[J];

   arrScores[J] := arrScores[iCount];
   arrNames[J] := arrNames[iCount];

   arrScores[iCount] := 0;
   arrNames[iCount] := '';

   Dec(iCount);
   end;

Answer 7

def dedup(l):
    ht, et = [(None, None) for _ in range(len(l))], []
    for e in l:
        h, n = hash(e), h % len(ht)
        while True:
            if ht[n][0] is None:
                et.append(e)
                ht[n] = h, len(et) - 1
            if ht[n][0] == h and et[ht[n][1]] == e:
                break
            if (n := n + 1) == len(ht):
                n = 0
    return et

用于删除字符串数组中重复项的最佳算法

7 个答案: