Question

我有一个看起来像这样的数据框（列中填充了电影的ID和演员的ID：

    movie  actor  clusterid
0    0      1     2
1    0      2     2
2    1      1     2
3    1      3     2
4    2      2     1

我想从这个数据框创建一个二进制共生矩阵，看起来像这个

                  actor1  actor2  actor3
clusterid 2 movie0    1      1     0
            movie1    1      0     1
clusterid 1 movie2    0      1     0

我的数据帧有（i）多索引（clusterid，movieid）和根据我的初始数据帧在电影中起作用的演员的二进制计数。

我试过了：

df.groupby("movie").agg('count').unstack(fill_value=0)

但不幸的是，这并没有扩展数据框并计算总数。可以使用内部pandas功能轻松完成这样的事情吗？

感谢您的任何建议

Answer 1

您可以创建一个额外的辅助列来指示该值是否存在，然后执行pivot_table：

(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
 .pivot_table('indicator', ['clusterid', 'movie'], 'actor', fill_value = 0))

polyfill

或使用set_index.unstack()模式：

(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
 .set_index(['clusterid', 'movie', 'actor']).indicator.unstack('actor', fill_value=0))

Python Pandas从两行创建Cooccurence Matrix

1 个答案: