Python中的统一成本搜索

时间:2017-04-11 19:32:54

标签: python algorithm search graph

我在Python中实现了一个简单的图形数据结构,结构如下。这里的代码只是为了澄清函数/变量的含义,但它们非常明显,因此您可以跳过阅读。

# Node data structure
class Node: 

    def __init__(self, label):        
        self.out_edges = []
        self.label = label
        self.is_goal = False


    def add_edge(self, node, weight = 0):          
        self.out_edges.append(Edge(node, weight))


# Edge data structure
class Edge:

    def __init__(self, node, weight = 0):          
        self.node = node
        self.weight = weight

    def to(self):                                  
        return self.node


# Graph data structure, utilises classes Node and Edge
class Graph:    

    def __init__(self):                             
        self.nodes = []

    # some other functions here populate the graph, and randomly select three goal nodes.

现在我正在尝试实现从给定节点v开始的uniform-cost search(即具有优先级队列的BFS,保证最短路径),并返回最短路径(以列表形式) )到三个目标节点之一。通过目标节点,我的意思是将属性is_goal设置为true的节点。

这是我的实施:

def ucs(G, v):
    visited = set()                  # set of visited nodes
    visited.add(v)                   # mark the starting vertex as visited
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples with cumulative cost
    q.put((0, v))                    # add the starting node, this has zero *cumulative* cost   
    goal_node = None                 # this will be set as the goal node if one is found
    parents = {v:None}               # this dictionary contains the parent of each node, necessary for path construction

    while not q.empty():             # while the queue is nonempty
        dequeued_item = q.get()        
        current_node = dequeued_item[1]             # get node at top of queue
        current_node_priority = dequeued_item[0]    # get the cumulative priority for later

        if current_node.is_goal:                    # if the current node is the goal
            path_to_goal = [current_node]           # the path to the goal ends with the current node (obviously)
            prev_node = current_node                # set the previous node to be the current node (this will changed with each iteration)

            while prev_node != v:                   # go back up the path using parents, and add to path
                parent = parents[prev_node]
                path_to_goal.append(parent)   
                prev_node = parent

            path_to_goal.reverse()                  # reverse the path
            return path_to_goal                     # return it

        else:
            for edge in current_node.out_edges:     # otherwise, for each adjacent node
                child = edge.to()                   # (avoid calling .to() in future)

                if child not in visited:            # if it is not visited
                    visited.add(child)              # mark it as visited
                    parents[child] = current_node   # set the current node as the parent of child
                    q.put((current_node_priority + edge.weight, child)) # and enqueue it with *cumulative* priority

现在,经过大量测试并与其他算法进行比较后,这个实现似乎运行得很好 - 直到我用这个图表试了一下:

Graph one

无论出于何种原因,ucs(G,v)返回路径H -> I,其成本为0.87,而不是路径H -> F -> I,成本为0.71(此路径是通过运行DFS获得的)。下图也给出了错误的路径:

Graph two

该算法提供了G -> F而不是G -> E -> F,由DFS再次获得。在这些极少数情况下,我能观察到的唯一模式是所选目标节点始终具有循环。我无法弄清楚出了什么问题。任何提示将不胜感激。

2 个答案:

答案 0 :(得分:1)

通常对于搜索,我倾向于保留队列中节点部分的路径。这不是真正的内存效率,但实现起来更便宜。

如果您想要父地图,请记住,当子项位于队列顶部时,更新父地图是安全的。只有这样,算法才能确定到当前节点的最短路径。

library(dplyr)

df %>%
  filter(complete.cases(.) & !duplicated(.)) %>% 
  group_by(column2) %>%
  summarize(count = n())

注意:我还没有对此进行过测试,如果它不能立即发挥作用,请随时发表评论。

答案 1 :(得分:0)

在扩展节点之前进行简单的检查可以为您节省重复的访问。

null