如何使用Networkx和CSV文件中的数据计算最近邻?

时间:2017-08-07 12:07:32

标签: python csv graph networkx

我有一个CSV文件(node.csv),其数据如下 -

    1           2           3           4           5           6           7           8           9           10
1   0           0.257905291 0.775104118 0.239086843 0.002313744 0.416936603 0.194817214 0.163350301 0.252043807 0.251272559
2   0.346100279 0           0.438892758 0.598885794 0.002263231 0.406685237 0.523850975 0.257660167 0.206302228 0.161385794
3   0.753358102 0.222349243 0           0.407830809 0.001714776 0.507573592 0.169905687 0.139611318 0.187910832 0.326950557
4   0.185342928 0.571302688 0.51784403  0           0.003231018 0.295197533 0.216184462 0.153032751 0.216331326 0.317961522
5   0           0           0           0           0           0           0           0           0           0
6   0.478164621 0.418192795 0.646810223 0.410746629 0.002414973 0           0.609176897 0.203461461 0.157576977 0.636747837
7   0.24894327  0.522914349 0.33948832  0.316240267 0.002335929 0.639377086 0           0.410011123 0.540266963 0.587764182
8   0.234017887 0.320967208 0.285193773 0.258198079 0.003146737 0.224412057 0.411725737 0           0.487081815 0.469526333
9   0.302955306 0.080506624 0.261610132 0.22856311  0.001746979 0.014994905 0.63386228  0.486096957 0           0.664434415
10  0.232675407 0.121596312 0.457715027 0.310618067 0.001872929 0.57556548  0.473562887 0.32185564  0.482351246 0  

我想使用 Networkx Python库来计算给定网络中的最近邻居(例如,包括最大数量,最小数量) - 程序是用这样的对于多次迭代,它应该能够产生显示" Node1邻居为2,3" ," Node2邻居是1,3"等等使用 Networkx 中的算法或内置函数。

节点的位置是(pos.txt) -

id  X   Y
1  21.5 23
2  24.5 20
3  19.5 19
4  22.5 15
5  24.5 12
6  19.5 12
7  22.5 8
8  24.5 4
9  21.5 2
10 19.5 5

首先,是否可以使用小于1的浮点值创建网络/图表? (这些值表示节点之间的连接速率,它还表示连接成功的概率以及节点之间传递消息的概率)
任何人都可以帮我这方面吗?

提前感谢您的帮助:)

1 个答案:

答案 0 :(得分:1)

关于您的第一个问题,并假设我们使用node.csv中的数字作为边的权重,一个简单的程序允许使用networkx来计算此图:

import matplotlib.pyplot as plt
import networkx as nx
import csv

g = nx.Graph()

i_dict = {}
with open("g.csv","r") as input:
    csv_dict = csv.DictReader(input, skipinitialspace=True, delimiter=",")
    ini = 1
    for row in csv_dict:
        for i in row:
            #print(row[i])
            if type(row[i]) is str:
                g.add_edge(ini, int(i), weight=(float(row[i])))
        ini += 1

pos=nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g,pos)
nx.draw_networkx_labels(g,pos)
plt.axis('off')
plt.show()

这会产生:

Sample graph

关于发现let的最近邻居说node1,仍然基于 来自node.csv的价值:

min_weight_neighbors = sorted(g[1].items(), key=lambda e: e[1]["weight"] if e[1]["weight"] != 0  else 1000000000)[:2] #remove edges with weight 0 from the computation

这又产生了权重最低的2个节点:

[(5, {'weight': 0.002313744}), (4, {'weight': 0.185342928})]

或者如果你想要2个权重最大的节点:

sorted(g[1].items(), key=lambda e: e[1]["weight"], reverse=True)[:2] #two nodes with the biggest weight

产生:

[(3, {'weight': 0.753358102}), (4, {'weight': 0.5342928})]

注意:我修改了一点node.csv

1,2,3,4,5,6,7,8,9,10
0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
0.5342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
0,0,0,0,0,0,0,0,0,0
0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0