Question

在处理问题时，我观察到一些奇怪的问题。有人可以解释其背后的逻辑。每当我添加要设置的元素时，它都会以正确的顺序插入它。如果该集合是无序的数据结构，这怎么可能？

以下是我观察到的一个示例：

>>> a = set([1,3,5])
>>> a
{1, 3, 5}
>>> a.pop()
1
>>> a
{3, 5}
>>> a.add(4)
>>> a
{3, 4, 5}
>>> a.add(6)
>>> a
{3, 4, 5, 6}
>>> a.add(2)
>>> a
{2, 3, 4, 5, 6}
>>>

我是如何偶然发现这一发现的：

我正努力解决以下问题：我必须设计一个在O（1）时间内进行插入，删除和getRandom的数据结构。有关详细信息，请访问https://leetcode.com/problems/insert-delete-getrandom-o1-duplicates-allowed/description/

我的基本想法是使用数字->列表的HashMap，该列表存储列表中插入的所有键的索引列表。与之一起，我维护一个值列表（V）。 List和HashMap将允许常量插入。 HashMap将允许不断删除值。如果将要删除的元素与列表的最后一个元素交换，然后删除最后一个元素，则可以实现从列表中不断删除值。

基本用例：

插入值1。1（1）附加到List（V）。该索引存储在HashMap中，键为1。
要获取随机，要从List（V）中随机选择一个元素
要删除值1，则从HashMap中弹出最后一个索引1，然后将该索引与新元素交换。然后，在HashMap中更新交换元素的最后一个索引，并删除List（V）中的最后一个元素。

我面临的问题是我需要在HashMap中的正确位置插入交换元素的新索引，以使该算法正常工作。

但是有趣的是，当我在HashMap中使用集合而不是列表时，不需要照顾它。集合以某种方式将元素插入正确的位置。我知道该集合应该是无序数据集，那么为什么要这样做。有人可以解释集合的这种行为吗？

以下是使用列表的代码，其中我必须使用二进制搜索在正确的位置插入交换索引。这里肯定删除不是O（1）

    import random
    import bisect
    class RandomizedCollection:

        def __init__(self):
            """
            Initialize your data structure here.
            """
            self.myMap = {}
            self.stack = []

        def insert(self, val):
            """
            Inserts a value to the collection. Returns true if the collection did not already contain the specified element.
            :type val: int
            :rtype: bool
            """
            #print("Inserting",val)
            #print(self.myMap,self.stack)
            tmp = self.myMap.get(val,[])
            if len(tmp) == 0:
                self.stack.append(val)
                tmp.append(len(self.stack)-1)
                self.myMap[val] = tmp
                return True
            else:
                self.stack.append(val)
                tmp.append(len(self.stack)-1)
                self.myMap[val] = tmp
                return False

        def remove(self, val):
            """
            Removes a value from the collection. Returns true if the collection contained the specified element.
            :type val: int
            :rtype: bool
            """
            #print("Removing",val)
            #print(self.myMap,self.stack)
            tmp = self.myMap.get(val,[])
            if len(tmp) > 0:
                if self.stack[-1] != val:
                    idx_to_remove = tmp.pop()
                    last_val = self.stack[-1]
                    #print(idx_to_remove, last_val)

                    self.myMap[last_val].pop() ## removes the last index
                    insert_pos = bisect.bisect_left(self.myMap[last_val],idx_to_remove)
                    self.myMap[last_val].insert(insert_pos,idx_to_remove)

                    self.stack[idx_to_remove],self.stack[-1] = self.stack[-1],self.stack[idx_to_remove]
                    self.stack.pop()
                else:
                    self.stack.pop()
                    tmp.pop()
                return True
            else:
                return False


        def getRandom(self):
            """
            Get a random element from the collection.
            :rtype: int
            """
            return random.choice(self.stack)

以下是使用Set的类似代码。我不知道为什么这行得通。

from collections import defaultdict
import random

class RandomizedCollection:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.nums = []
        self.num_map = defaultdict(set)


    def insert(self, val):
        """
        Inserts a value to the collection. Returns true if the collection did not already contain the specified element.
        :type val: int
        :rtype: bool
        """
        self.nums.append(val)
        self.num_map[val].add(len(self.nums) - 1)
        return True


    def remove(self, val):
        """
        Removes a value from the collection. Returns true if the collection contained the specified element.
        :type val: int
        :rtype: bool
        """
        if len(self.num_map[val]) == 0:
            return False
        index = self.num_map[val].pop()
        last_index = len(self.nums) - 1
        if not (index == last_index):
            last_val = self.nums[last_index]
            self.nums[index] = last_val
            self.num_map[last_val].remove(last_index)
            self.num_map[last_val].add(index)
        self.nums.pop()
        return True


    def getRandom(self):
        """
        Get a random element from the collection.
        :rtype: int
        """
        return self.nums[random.randint(0, len(self.nums) - 1)]

Answer 1

python中的设置不保证订购。实际上，当不维护订单时很常见。例如，在我的Python-2.7和3.5.2实现中：

>>> a = set([3,10,19])
>>> a
set([19, 10, 3])
>>> a.add(1)
>>> a
set([19, 1, 10, 3])
>>> a.pop()
19
>>> a.pop()
1
>>> a
set([10, 3])

有时使用set维护顺序，因为这是哈希表的工作方式。通常，set开始于具有size=a*len(hash_table)存储桶的哈希表。元素value插入到存储桶编号value % size中。对于小而密集的整数value < size。这意味着在这种情况下，value插入到存储桶编号value，这是值的 sorting 顺序。

这表明set在某些情况下保持排序顺序并不奇怪，但这并不重要。关键是RandomizedCollection类起作用的原因。

实际上，RandomizedCollection的两个实现都具有相同的概念数据结构。基于列表的变体的self.stack的读取和更新与基于self.nums的变体的set完全相同。它们都维护RandomizedCollection对象中所有元素的平面列表。

基于列表的实现使用以下事实：self.myMap[val]仅在remove()中位于以下位置：

last_val = self.stack[-1]
self.myMap[last_val].pop() ## removes the last index

由于self.myMap[last_val]是索引的排序列表，因此它的最后一个元素必须是last_val中的最后一个self.stack元素。由于last_val是从self.stack的末尾取出的，因此保证self.myMap[last_val]的最后一个元素指向len(self.stack)-1)。这意味着同时弹出self.stack和self.myMap[last_val]的最后一个元素将使数据结构保持一致。除上述两行外，将self.myMap[val]设为已排序的列表只会产生费用。在列表中搜索索引为O(log K)，插入为O(K)。

基于set的解决方案以不同的顺序工作。而不是通过以下方法删除指向O（1）中指向堆栈末尾的索引：

self.myMap[last_val].pop() # (Sorted list variant)

它使用set的O（1）功能擦除同一元素：

self.num_map[last_val].remove(last_index)

效果是一样的。在这两种情况下，映射都不再指向堆栈上的最后一个元素。无需对set进行排序，仅需删除O（1）中的元素即可。最终，堆栈中的最后一个元素将被简单的self.stack.pop()或self.nums.pop()删除。

有时不是最后一个必须删除的元素，但是代码仍然删除了它。这个想法是然后删除 correct 元素，并代之以“错误地”删除的元素。具有排序列表的解决方案在O（K）中很难重新插入删除的元素：

insert_pos = bisect.bisect_left(self.myMap[last_val],idx_to_remove)
self.myMap[last_val].insert(insert_pos,idx_to_remove)

对于set，这很简单，因为不需要订购：

self.num_map[last_val].add(index)

最后，将更新包含实际值（self.nums和self.stack）的数组，以便最后删除最后一个元素，然后重新插入，而不是将应删除的元素重新插入第一名。

总而言之，两个数据结构都是等效的。在一个索引集中，该索引集被维护为一个排序列表，而在另一个索引中，则被维护为set。排序列表变体中的代码有时会利用列表已排序的事实，以便删除O（1）中最大的元素。但是，对于set情况，不需要这样的优化，因为删除任何元素都是O（1），并且可以使用remove()代替pop()。

Python中带有集合的插入删除获取随机O（1）顺序算法的工作

1 个答案: