我有一个包含7列的数据框,如下所示:
Bank_Acct Firstname | Bank_Acct Lastname | Bank_AcctNumber | Firstname | Lastname | ID | Date1 | Date2
B1 | Last1 | 123 | ABC | EFG | 12 | Somedate | Somedate
B2 | Last2 | 245 | ABC | EFG | 12 | Somedate | Somedate
B1 | Last1 | 123 | DEF | EFG | 12 | Somedate | Somedate
B3 | Last3 | 356 | ABC | GHI | 13 | Somedate | Somedate
B4 | Last4 | 478 | XYZ | FHJ | 13 | Somedate | Somedate
B5 | Last5 | 599 | XYZ | DFI | 13 | Somedate | Somedate
我想创建一个字典:
{ID1: (Count of Bank_Acct Firstname, Count of distinct Bank_Acct Lastname,
{Bank_AcctNumber1 : ItsCount, Bank_AcctNumber2 : ItsCount},
Count of distinct Firstname, Count of distinct Lastname),
ID2: (...), }
对于上面的例子:
{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}
以下是代码:
cols = ['Bank First Name', 'Bank Last Name' 'Bank AcctNumber', 'First Name', 'Last Name']
df1 = df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
d = df1.to_dict()
但上面的代码只给出了输出:
{12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)}
给出不同银行代码的计数而不是内部字典。
如何获取所需的字典?谢谢!
答案 0 :(得分:2)
您可以在列表中定义列和函数
In [15]: cols = [
...: {'col': 'Bank_Acct Firstname', 'func': pd.Series.nunique},
...: {'col': 'Bank_Acct Lastname', 'func': pd.Series.nunique},
...: {'col': 'Bank_AcctNumber', 'func': lambda x: x.value_counts().to_dict()},
...: {'col': 'Firstname', 'func': pd.Series.nunique},
...: {'col': 'Lastname', 'func': pd.Series.nunique}
...: ]
In [16]: df.groupby('ID').apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
Out[16]:
ID
12 (2, 2, {123: 2, 245: 1}, 2, 1)
13 (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)
dtype: object
In [17]: (df.groupby('ID')
.apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
.to_dict())
Out[17]:
{12: (2, 2, {123: 2, 245: 1}, 2, 1),
13: (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}