Question

我正在尝试自学数据结构，我正在用Python实现一个k-d树。我有一种方法在我的k-d树类中的一个点的某个半径范围内搜索树中的点：

def within_radius(self, point, radius, result=[]):
    """
    Find all items in the tree within radius of point
    """
    d = self.discriminator

    if in_circle(point, radius, self.data):
        result.append(self.data)

    # Check whether any of the points in the subtrees could be
    # within the circle
    if point[d] - radius < self.data[d] and self.l_child:
        result.append(self.l_child.within_radius(point, radius, result))

    if point[d] + radius > self.data[d] and self.r_child:
        result.append(self.r_child.within_radius(point, radius, result))

    return result

它可以工作，但它返回的列表非常时髦，带有result的递归调用的重复值。将从树递归返回的值“累积”到列表中的好方法是什么？我已经考虑了一段时间，但我真的不知道如何。

Answer 1

我不确定这是否是最干净的方法，但每当我这样做递归时，我经常会添加一个关键字参数，这是要返回的列表。这样，当我修改列表时，我总是修改为相同的列表：

def recurse(max, _output=None):
    if _output is None:
        _output = []

    #do some work here, add to your list
    _output.append(max)

    if max <= 0: #Some condition where the recursion stops
        return _output
    else:        #recurse with new arguments so that we'll stop someday...
        return recurse(max-1, _output=_output)

这是有效的，因为当停止条件为True时，将返回_output列表并将其一直传递回堆栈。

我使用一个下划线变量名来表示它只能在函数本身中使用。这是使用下划线前缀变量的正常方式的一个小扩展（在类中表示变量是“私有”），但我认为它得到了重点......

请注意，这与您的版本并无太大差异。但是，对于您的版本，result将在调用之间保持不变，因为result = []已被评估创建函数，而不是在调用时。此外，您的版本附加了返回值（列表本身）。当你想到列表中有多个引用时，这会变得非常复杂......

Answer 2

我同意mgilson的分析。 list是一种可变类型，list.append就位。这就是：

有两种类型：可变和不可变。

可变类型存在于内存中的相同位置，即使您更改它也是如此。例如，list和dict是可变类型。这意味着如果您创建list并以某种方式更改它，它仍将存在于内存中的相同位置。因此，假设您创建了一个名为“myList”的list。让我们说这个列表在内存位置0x9000。然后，执行myList.append(0)不会更改内存中myList的位置。即使您执行了myList[0] = 'a'，该位置也不会更改 - 它仍将保留为0x9000。

当您尝试以任何方式更改它时，不可变类型将“移动”到不同的内存位置。 str和tuple是不可变的。这就是您收到以下错误的原因：

>>> s = 'as'
>>> s[0] = 'b'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

这意味着，即使您定义s = 'as'（并且假设s现在位于内存地址0x5000），并将其重新定义为s = 'af'，s的位置记忆中的变化。

现在，当你重新分配一个可变类型时，它在内存中的位置会发生变化。例如，

L = [1,2,3]＃表示存储单元0x4000       L = [5,6,7]＃内存位置不再是0x4000

这是list.append属于“就地”的财产发挥作用的地方。 “list.append就地”表示将新元素添加到列表而不创建新列表。这就是list.append没有返回值的原因，如下所示：

>>> L = [1,2,3]
>>> ret = L.append(4)
>>> print L
[1, 2, 3, 4]
>>> print ret
None

但是，如果要创建新列表，可以按如下方式执行：

>>> L = [1,2,3]
>>> ret = L + [4]
>>> print L
[1, 2, 3]
>>> print ret
[1, 2, 3, 4]

所以在你的情况下发生的事情是，在递归调用（左和右）中，point被附加到每个递归调用的列表中。这就是您获得重复值的原因。

你可以通过做mgilson建议，或者如果你是一个非常好的lisp粉丝来规避这个问题（这是一个非常好的lisp问题），那么你可以使用[1,2,3] + [4]原则并做到这一点（未经测试，但应该工作））：

def within_radius(self, point, radius, result=[]):
    """
    Find all items in the tree within radius of point
    """
    d = self.discriminator

    temp = []

    if in_circle(point, radius, self.data):
        temp = [self.data]

    # Check whether any of the points in the subtrees could be
    # within the circle
    if point[d] - radius < self.data[d] and self.l_child:
        temp += self.l_child.within_radius(point, radius, result)

    if point[d] + radius > self.data[d] and self.r_child:
        temp += self.r_child.within_radius(point, radius, result)

    return result+temp

希望这有帮助

Answer 3

以下是一些想法：

如果您只想返回唯一结果，则应该使用一个集合并在返回时将其转换为列表。唯一的问题是self.data必须是不可变的，例如元组而不是列表。
因为您通过递归和添加线程result并将递归调用的结果附加到它，所以您明确地将每次匹配添加到结果中至少两次。通过递归来处理结果将使您无法创建和丢弃数据结构，因此您可能只是这样做。
正如mgilson指出的那样，由于Python处理默认参数的方式，将result设置为函数声明中的空列表将不会按您的想法执行。每当您在未明确传入within_radius的情况下拨打result时，每次通话都会累计点击次数，而不仅仅是个人通话。（这有意义吗？见this）。 mgilson的答案也指出了这一点。

考虑到所有这些，我可能会做这样的事情：

def within_radius(self, point, radius, result=None):
    d = self.discriminator

    result = set() if result is None else result

    if in_circle(point, radius, self.data):
        result.add(self.data)
    if point[d] - radius < self.data[d] and self.l_child:
        self.l_child.within_radius(point, radius, result)
    if point[d] + radius > self.data[d] and self.r_child:
        self.r_child.within_radius(point, radius, result)

    return list(result)

从树递归返回值列表

3 个答案: