Question

我的数据框代表图形边缘的列表，并具有以下格式：

  node1 node2 weight
0     a     c      1
1     b     c      2
2     d     c      3

我的目标是生成等效的邻接矩阵：

    a b c d
a   0 0 1 0
b   0 0 2 0
c   0 0 0 3
d   0 0 0 0

目前，在构建边缘的数据框时，我计算了节点数并创建了NxN数据框并手动填写值。从第一个生成第二个数据帧的熊猫方法是什么？

Answer 1

决定对这个问题有所兴趣。

您可以将node1和node2转换为Categorical dtype，然后使用groupby。

from functools import partial

vals = np.unique(df[['node1', 'node2']])
p = partial(pd.Categorical, categories=vals) 
df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

(df.groupby(['node1', 'node2'])
   .first()
   .fillna(0, downcast='infer')
   .weight
   .unstack())

node2  a  b  c  d
node1            
a      0  0  1  0
b      0  0  2  0
c      0  0  0  0
d      0  0  3  0

另一个选择是直接设置基础数组值。

df2 = pd.DataFrame(0, index=vals, columns=vals)
f = df2.index.get_indexer
df2.values[f(df.node1), f(df.node2)] = df.weight.values

print(df2)
   a  b  c  d
a  0  0  1  0
b  0  0  2  0
c  0  0  0  0
d  0  0  3  0

Answer 2

将pivot与reindex一起使用

In [20]: vals = np.unique(df[['node1', 'node2']])

In [21]: df.pivot(index='node1', columns='node2', values='weight'
                  ).reindex(columns=vals, index=vals, fill_value=0)
Out[21]:
node2  a  b  c  d
node1
a      0  0  1  0
b      0  0  2  0
c      0  0  0  0
d      0  0  3  0

或使用set_index和unstack

In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
            .reindex(columns=vals, index=vals, fill_value=0))
Out[27]:
node2  a  b  c  d
node1
a      0  0  1  0
b      0  0  2  0
c      0  0  0  0
d      0  0  3  0

将边列表转换为邻接矩阵

2 个答案: