
时间:2017-09-13 20:09:22

标签: python pandas numpy graph networkx


给定数据框 df

Fruit1     Fruit2      Weight
orange     apple       0.2
orange     grape       0.4
orange     pineapple   0.6
orange     banana      0.8
apple      grape       0.9
apple      pineapple   0.3
apple      banana      0.2
grape      pineapple   0.1
pineapple  banana      0.8

和最大允许路径长度的约束, L

我希望返回一个具有最高平均路径的数据帧(即点/路径长度之间所有边缘的总和为最大值),其中边缘由 weight 列表示,给定它不超过长度L.


假设我们只有4分A,B,C& D.我们有兴趣找到A& A之间的最高平均路径。 d。

最高平均路径将是max((A-> D)/ 1,(A-> B + B-> D)/ 2,(A-> C + C-> D) / 2,(A-> B + B-> C + C-> D)/ 3,(A-> C + C-> B + B-> D)/ 3)在这种情况下L = 3

对于L = 2,它将是max((A-> D)/ 1,(A-> B + B-> D)/ 2,(A-> C + C-> D)/ 2)

在df的情况下,对于L = 2,我们会得到类似

Fruit1     Fruit2      Weight   MaxAvgPath(L=2)
orange     apple       0.2       [orange, grape, apple]  
orange     grape       0.4       [orange, apple, grape]
orange     pineapple   0.6       [orange, banana, pineapple]
orange     banana      0.8       [orange, banana]
apple      grape       0.9       [apple, grape]
apple      pineapple   0.3       [apple, grape, pineapple]
apple      banana      0.2       [apple, pineapple, banana] 
grape      pineapple   0.1       [grape, orange, pineapple]
grape      banana      0.1       [grape, apple, banana]
pineapple  banana      0.8       [pineapple, banana]


1 个答案:

答案 0 :(得分:2)

感谢您的澄清,因此您要求图表中每对节点之间的成本最高/(边数)的路径,其中路径被限制为连接边缘的上限。最长的路径问题是np-hard,因此只有限制才能实现有效的解决方案  (见https://en.wikipedia.org/wiki/Longest_path_problem)。我认为你的边缘连接限制人为地强制执行长度为L的最长路径,因此可以将其降低到L指数。




import pandas
import networkx

# Create a graph from the dataframe
G = networkx.from_pandas_dataframe(path_frame, 'Fruit1', 'Fruit2', 'Weight')

# Find the longest path between source and target up to length L
def maxpath_cond(G, source, target, edge_attr, L=None):
    #Use networkx simple paths function which uses a depth first search
    paths = networkx.simple_paths.all_simple_paths(G,source, target, L)
    # Calculate and sort the costs of the path
    costs = [(pathcost(G, pth, edge_attr), pth) for pth in paths]
    return sorted(costs, key=lambda x:x[0], reverse=True)

def pathcost(G,path, edge_attr):
    lp = len(path)-1
    return sum(G[path[n]][path[n+1]][edge_attr] for n in range(lp))/lp
#Iterate through the dataframe and create a new series made up of long paths
mxs = []
for n in range(len(path_frame)):
    src, targ = path_frame.loc[n]['Fruit1'], path_frame.loc[n]['Fruit2']
    mxl = maxpath_cond(G, src, targ, 'Weight', 2)[0]
    mxs.append( mxl[1])
