我现在有以下格式的数据,此数据位于Pandas数据框中。
Row Uid Lender
1 1 HSBC
2 1 Lloyds
3 1 Barclays
4 2 Lloyds
5 2 Barclays
6 2 Santander
7 2 RBS
8 2 HSBC
我需要的是每个Uid的Lenders列的所有可能组合,因此输出将是这样
Row Uid LenderCombo
1 1 Barclays
2 1 Lloyds
3 1 HSBC
4 1 Barclays, HSBC
5 1 Barclays, Lloyds
6 1 HSBC, Lloyds
7 1 Barclays, HSBC, Lloyds
与Uid 2等相同,如果在我不确定如何解决此问题之前已经回答了此问题,则表示歉意。
谢谢
答案 0 :(得分:5)
将GroupBy.apply
与自定义功能一起使用,并通过join
连接元组:
from itertools import chain, combinations
#https://stackoverflow.com/a/5898031
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(1, len(ss)+1)))
df = (df.groupby('Uid')['Lender']
.apply(lambda x: pd.Series([', '.join(y) for y in all_subsets(x)]))
.reset_index()
.rename(columns={'level_1':'Row'}))
print (df)
Uid Row Lender
0 1 0 HSBC
1 1 1 Lloyds
2 1 2 Barclays
3 1 3 HSBC, Lloyds
4 1 4 HSBC, Barclays
5 1 5 Lloyds, Barclays
6 1 6 HSBC, Lloyds, Barclays
7 2 0 Lloyds
8 2 1 Barclays
9 2 2 Santander
10 2 3 RBS
11 2 4 HSBC
12 2 5 Lloyds, Barclays
13 2 6 Lloyds, Santander
14 2 7 Lloyds, RBS
15 2 8 Lloyds, HSBC
16 2 9 Barclays, Santander
17 2 10 Barclays, RBS
18 2 11 Barclays, HSBC
19 2 12 Santander, RBS
20 2 13 Santander, HSBC
21 2 14 RBS, HSBC
22 2 15 Lloyds, Barclays, Santander
23 2 16 Lloyds, Barclays, RBS
24 2 17 Lloyds, Barclays, HSBC
25 2 18 Lloyds, Santander, RBS
26 2 19 Lloyds, Santander, HSBC
27 2 20 Lloyds, RBS, HSBC
28 2 21 Barclays, Santander, RBS
29 2 22 Barclays, Santander, HSBC
30 2 23 Barclays, RBS, HSBC
31 2 24 Santander, RBS, HSBC
32 2 25 Lloyds, Barclays, Santander, RBS
33 2 26 Lloyds, Barclays, Santander, HSBC
34 2 27 Lloyds, Barclays, RBS, HSBC
35 2 28 Lloyds, Santander, RBS, HSBC
36 2 29 Barclays, Santander, RBS, HSBC
37 2 30 Lloyds, Barclays, Santander, RBS, HSBC