如何找到原始数据的最短路径

时间:2013-06-18 07:43:03

标签: python social-networking networkx

我有一个数据文件,它将分类数据作为列。像

node_id,second_major,gender,major_index,year,dorm,high_school,student_fac
0,0,2,257,2007,111,2849,1
1,0,2,271,2005,0,51195,2
2,0,2,269,2007,0,21462,1
3,269,1,245,2008,111,2597,1
..........................

此数据位于列中。我将其转换为edgelist和nodelist。 边缘列表如:

0   4191
0   949
1   3002
1   4028
1   957
2   2494
2   959
2   3011
3   4243
4   965
5   1478
........
........

要找到节点之间的最短路径,究竟需要做些什么。边缘没有重量。我如何在python中实现这个代码?

1 个答案:

答案 0 :(得分:2)

这是一个经典的广度优先搜索问题,您有一个无向,未加权的图表,并且您希望找到2个顶点之间的最短路径。

关于广度优先搜索的一些有用链接:

您需要注意的一些边缘情况:

  • 源顶点和目标顶点之间没有路径
  • 源和目标是相同的顶点

我认为你的边缘列表是列表或列表列表,例如

[[4191, 949], [3002, 4028, 957], [2494, 959, 3011], [4243, 965], [1478], ...]

或者

{ 0: [4191, 949],
  1: [3002, 4028, 957],
  2: [2494, 959, 3011],
  3: [4243, 965],
  4: [1478], ...}

我已经编写了一些代码来展示广度优先搜索的工作原理:

import sys
import sys
import Queue

def get_shortest_path(par, src, dest):
    '''
    Returns the shortest path as a list of integers
    par - parent information
    src - source vertex
    dest - destination vertex
    '''
    if dest == src:
        return [src]
    else:
        ret = get_shortest_path(par, src, par[dest])
        ret.append(dest)
        return ret

def bfs(edgeList, src, dest):
    '''
    Breadth first search routine. Returns (distance, shortestPath) pair from src to dest. Returns (-1, []) if there is no path from src to dest
    edgeList - adjacency list of graph. Either list of lists or dict of lists
    src - source vertex
    dest - destination vertex
    '''
    vis = set() # stores the vertices that have been visited
    par = {} # stores parent information. vertex -> parent vertex in BFS tree
    distDict = {} # stores distance of visited vertices from the source. This is the number of edges between the source vertex and the given vertex
    q = Queue.Queue()
    q.put((src, 0)) # enqueue (source, distance) pair
    par[src] = -1 # source has no parent
    vis.add(src) # minor technicality, will explain later
    while not q.empty():
        (v,dist) = q.get() # grab vertex in queue
        distDict[v] = dist # update the distance
        if v == dest:
            break # reached destination, done
        nextDist = dist+1
        for nextV in edgeList[v]:
            # visit vertices adjacent to the current vertex
            if nextV not in vis:
                # not yet visited
                par[nextV] = v # update parent of nextV to v
                q.put((nextV, nextDist)) # add into queeu
                vis.add(nextV) # mark as visited
    # obtained shortest path now
    if dest in distDict:
        return (distDict[dest], get_shortest_path(par, src, dest))
    else:
        return (-1, []) # no shortest path

# example run, feel free to remove this
if __name__ == '__main__':
    edgeList = {
        0: [6,],
        1: [2, 7],
        2: [1, 3, 6],
        3: [2, 4, 5],
        4: [3, 8],
        5: [3, 7],
        6: [0, 2],
        7: [1, 5],
        8: [4],
    }
    while True:
        src = int(sys.stdin.readline())
        dest = int(sys.stdin.readline())
        (dist, shortest_path) = bfs(edgeList, src, dest)
        print 'dist =', dist
        print 'shortest_path =', shortest_path