Question

我有一个名为 composite 的数据框，它看起来像这样：

| ID | Person.ID | V.F   | V.nF  |
|----|-----------|-------|-------|
| 1  | 111       | True  | True  |
| 2  | 222       | False | True  |
| 3  | 333       | True  | False |
| 4  | 444       | True  | False |
| 5  | 555       | True  | True  |
| 6  | 666       | False | True  |

对于每个Person.ID，在名为 nn_list 的词典中，我具有与每个Person.ID相关的所有Person.ID。看起来像：

{ 111:[222,333,444],
222:[111,333],
333:[444],
444:[222,555],
555:[333,666],
666:[222],
}

我希望能够查看给定ID的所有关联Person.ID的字典，将关联ID的布尔值（每一列）加起来，然后在新列中分配该值（ s）每行。结果看起来像这样：

| ID | Person.ID | V.F   | V.nF  | n_V.F | n_V.nF |
|----|-----------|-------|-------|-------|--------|
| 1  | 111       | True  | True  | 2     | 1      |
| 2  | 222       | False | True  | 2     | 1      |
| 3  | 333       | True  | False | 1     | 0      |
| 4  | 444       | True  | False | 1     | 2      |
| 5  | 555       | True  | True  | 1     | 1      |
| 6  | 666       | False | True  | 0     | 1      |

我目前能够以非常缓慢和低效的方式执行此操作：

l=[composite.loc[composite['Person.ID'].isin(nn_list[x]),'V.F'].sum() for x in composite['Person.ID']]
composite['n_V.F']=l

l=[composite.loc[composite['Person.ID'].isin(nn_list[x]),'V.nF'].sum() for x in composite['Person.ID']]
composite['n_V.nF']=l

是否有一种更聪明的方式来执行此操作，从而使运行时间不会很长？谢谢！

Answer 1

我们可以先做Info.plist，然后再做explode：在merge中0.25之后通知爆炸才可用

pandas

s=pd.Series(d).explode().to_frame('Person.ID').reset_index()
s=s.merge(df).groupby('index')[['V.F','V.nF']].sum()
Newdf=pd.concat([df.set_index('Person.ID'),s.add_prefix('n_')],axis=1).reset_index()
Newdf
   index  ID    V.F   V.nF  n_V.F  n_V.nF
0    111   1   True   True    2.0     1.0
1    222   2  False   True    2.0     1.0
2    333   3   True  False    1.0     0.0
3    444   4   True  False    1.0     2.0
4    555   5   True   True    1.0     1.0
5    666   6  False   True    0.0     1.0

Answer 2

使用map的另一种方法：

composite.set_index('Person.ID', inplace=True)

s = pd.concat(pd.Series(y, index=[x]*len(y)) for x,y in d.items())

composite['n_V.F'] = s.map(u['V.F']).groupby(level=0).sum()
composite['n_V.nF'] = s.map(u['V.nF']).groupby(level=0).sum()

输出：

           ID    V.F   V.nF  n_V.F  n_V.nF
Person.ID                                 
111         1   True   True    2.0     1.0
222         2  False   True    2.0     1.0
333         3   True  False    1.0     0.0
444         4   True  False    1.0     2.0
555         5   True   True    1.0     1.0
666         6  False   True    0.0     1.0

Answer 3

将您的字典指定给d。您可以使用dict理解直接在.loc的值上使用sum和d。之后，根据结果字典构造数据框，然后重新加入df

df1 = df.set_index('Person.ID')
n = {k: df1.loc[v, ['V.F', 'V.nF']].values.sum(0) for k, v in d.items()}

Out[889]:
{111: array([2, 1]),
 222: array([2, 1]),
 333: array([1, 0]),
 444: array([1, 2]),
 555: array([1, 1]),
 666: array([0, 1])}

df2 = pd.DataFrame.from_dict(n, orient='index', columns=['n_V.F', 'n_V.nF'])
df1.join(df2).reset_index()

Out[898]:
   Person.ID  ID    V.F   V.nF  n_V.F  n_V.nF
0        111   1   True   True      2       1
1        222   2  False   True      2       1
2        333   3   True  False      1       0
3        444   4   True  False      1       2
4        555   5   True   True      1       1
5        666   6  False   True      0       1

为每个ID的关联ID求和布尔值并将其分配给ID

3 个答案: