Python:创建网络的最佳方式?

时间:2018-05-30 10:41:41

标签: python pandas networkx

我有一个包含两家公司之间交易信息的数据框

df    
      idA   idB   amount  nameA  nameB
0      4     5     300     xxx    yyy
1      3     7     150     kkk    uuu 
2      3     6     289     kkk    vvv
3      1     4     189     hhh    iii

我想使用networkx包创建网络。

G=nx.Graph()
for i in df.index:
    G.add_node(df['idA'][i], name = df['nameA'][i])
    G.add_node(df['idB'][i], name = df['nameB'][i])
    G.add_edge(df['idA'][i], df['idB'][i], weight = df['amount'][i] )

我想知道是否有更有效的方式

1 个答案:

答案 0 :(得分:4)

答案是肯定的。请查看此文档:https://networkx.github.io/documentation/latest/reference/generated/networkx.convert_matrix.from_pandas_edgelist.html

在你的情况下我会这样做:

G=nx.from_pandas_edgelist(df, 'idA', 'idB', ['amount'])

如果您要向节点添加其他一些属性,请按以下文档说明:https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.classes.function.set_node_attributes.html

编辑: 对不起,但我没有注意到来自networkx 2.0 from_pandas_dataframe已被删除。非常感谢@tohv回答了这个问题here

最后,正如我评论的那样,这些都是优化的功能。如果我们比较它们执行for循环的相同功能的速度,则差异是一致的。

from random import randint
import pandas as pd
import networkx as nx
from time import time
import numpy as np

df = pd.DataFrame()
df['a'] = [randint(0, 100) for _ in range(10000)]
df['b'] = [randint(0, 100) for _ in range(10000)]

c = 0
runs = []
while c <= 100:
    G=nx.Graph()
    start = time()
    for i in df.index:
        G.add_node(df['a'][i], name = df['a'][i])
        G.add_node(df['b'][i], name = df['b'][i])
        G.add_edge(df['a'][i], df['b'][i])

    run = time() - start
    runs.append(run)
    c += 1

print ('done in:', np.mean(runs), 'seconds')

完成时间:0.6191224154859486秒

c = 0
runs = []
while c <= 100:
    G=nx.Graph()
    start = time()
    G=nx.from_pandas_edgelist(df, 'a', 'b')
    run = time() - start
    runs.append(run)
    c += 1

print ('done in:', np.mean(runs), 'seconds')

完成于:0.014413160852866598秒