Question

我有一个类似于联属矩阵的DataFrame。我有一个人，一个活动和一年的活动。

d = {'person' : ['1', '2', '3', '1', '4', '3', '4', '1', '2'],
    'event' : ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
    'year' : [1995, 1995, 1995, 1996, 1996, 2000, 2000, 2001, 2001]}

df = pd.DataFrame(d)

我需要在两个人之间举行第一次会面。也就是说，如果＆＃39; 1＆＃39;和＆＃39; 2＆＃39;在活动中遇到了A＆＃39;我需要知道他们第一次见面的时间（在这个例子中，它是在1995年的A＆＃39;）。

我不知道使用NetworkX是否可行，或者我是否需要使用Pandas以其他方式进行此操作。我怎么能这样做？

我可以访问预计的网络，但我不知道如何转移属性＆＃39; year＆＃39;到那个投影网络的边缘。重要的是要注意，属性（在这种情况下为＆＃39; year＆＃39;）是事件的属性，因此它对于每个事件的所有边缘都是常量。

这是我到目前为止所做的：

import networkx as nx
import pandas as pd

d = {'person' : ['1', '2', '3', '1', '4', '3', '4', '1', '2'],
     'event' : ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
     'year' : [1995, 1995, 1995, 1996, 1996, 2000, 2000, 2001, 2001]}

df = pd.DataFrame(d)

B = nx.from_pandas_dataframe(df, 'person', 'event', edge_attr='year')

G = nx.bipartite.projected_graph(B, df.person.unique(), multigraph = True)

Answer 1

我对NetworkX不太熟悉，无法帮助您解决添加边缘属性的问题，但此方法确实可以识别出第一次个人会议。

import pandas as pd
import itertools

# initial data
d = {'person' : ['1', '2', '3', '1', '4', '3', '4', '1', '2'],
     'event' : ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
     'year' : [1995, 1995, 1995, 1996, 1996, 2000, 2000, 2001, 2001]}

df = pd.DataFrame(d)

# create a unique list of individuals for each meeting. this should be
# unique anyway, but just in case. :)
# note that this approach is also robust to events in different years
# sharing the same name.

grpd = df.groupby(['year', 'event'])['person'].unique().apply(lambda x: sorted(x))

# sort based on the year from the oldest meetings to the most recent
grpd.sort_index(ascending=False, inplace=True)

# we'll add meetings to a dictionary and overwrite as encounter more
# recent meetings

meetings = {}

for idx in range(len(grpd)):
    year = grpd.index[idx][0]
    meeting = grpd.index[idx][1]
    for combo in itertools.combinations(grpd[idx], 2):
        meetings[combo] = (meeting, year)


import pprint

>>> pprint.pprint(meetings)
{('1', '2'): ('A', 1995),
 ('1', '3'): ('A', 1995),
 ('1', '4'): ('B', 1996),
 ('2', '3'): ('A', 1995),
 ('3', '4'): ('C', 2000)

在投影图中向边添加属性

1 个答案: