以下是我的大型数据框的一小部分示例
Txn_Key Send_Agent Send_Time Pay_Time Send_Amount \
0 NaN ANO080012 2012-05-31 02:25:00 2012-05-31 21:43:00 490.00
1 NaN AUK359401 2012-05-31 11:25:00 2012-05-31 11:57:00 616.16
2 NaN ACL000105 2012-05-31 13:07:00 2012-05-31 17:36:00 193.78
3 NaN AED420319 2012-05-31 10:50:00 2012-05-31 11:34:00 999.43
4 NaN ARA030210 2012-05-30 12:14:00 2012-05-31 04:16:00 433.29
5 NaN AJ5020114 2012-05-31 02:37:00 2012-05-31 04:31:00 378.00
6 NaN A11171047 2012-05-31 09:39:00 2012-05-31 10:08:00 865.34
Pay_Amount MTCN Send_Phone Refund_Flag time_diff
0 475.68 9323625903 97549829 NaN 0 days 19:18:00
1 600.87 3545067820 440000000000 NaN 0 days 00:32:00
2 185.21 1453132764 0511 NaN 0 days 04:29:00
3 963.04 4509062067 971566016900 NaN 0 days 00:44:00
4 423.75 6898279087 144 NaN 0 days 16:02:00
5 377.99 5170985243 963954932506 NaN 0 days 01:54:00
6 833.89 5352719100 0644798854 NaN 0 days 00:29:00
grouped = frame1.groupby('Send_Agent')
x=grouped.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique()})
我想知道如何使用上面提到的groupby.agg获取Refund_Flag的非空计数。
尝试使用像
这样的lambda 'Refund_Flag':lambda x:pd.count(x.notnull())
返回错误: AttributeError:'module'对象没有属性'count'
答案 0 :(得分:3)
您可以使用count
系列方法:
In [11]: g.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique(), "Refund_Flag": lambda x: x.count()})
Out[11]:
Refund_Flag MTCN Send_Phone Pay_Amount time_diff Send_Amount
Send_Agent
A11171047 0.0 1 1 833.89 0 days 00:29:00 865.34
ACL000105 0.0 1 1 185.21 0 days 04:29:00 193.78
AED420319 0.0 1 1 963.04 0 days 00:44:00 999.43
AJ5020114 0.0 1 1 377.99 0 days 01:54:00 378.00
ANO080012 0.0 1 1 475.68 0 days 19:18:00 490.00
ARA030210 0.0 1 1 423.75 0 days 16:02:00 433.29
AUK359401 0.0 1 1 600.87 0 days 00:32:00 616.16
您还可以传递字符串'count'(如果可能,它会生成一个Int列)。
In [12]: g.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique(), "Refund_Flag": 'count'})
Out[12]:
Refund_Flag MTCN Send_Phone Pay_Amount time_diff Send_Amount
Send_Agent
A11171047 0 1 1 833.89 0 days 00:29:00 865.34
ACL000105 0 1 1 185.21 0 days 04:29:00 193.78
AED420319 0 1 1 963.04 0 days 00:44:00 999.43
AJ5020114 0 1 1 377.99 0 days 01:54:00 378.00
ANO080012 0 1 1 475.68 0 days 19:18:00 490.00
ARA030210 0 1 1 423.75 0 days 16:02:00 433.29
AUK359401 0 1 1 600.87 0 days 00:32:00 616.16
注意:你可以对'mean'和'min'等做同样的事情。
In [13]: g.agg({'Send_Amount': 'mean','Pay_Amount': 'mean' ,'time_diff': 'min','MTCN': 'size', 'Send_Phone': 'nunique', "Refund_Flag": 'count'})
count方法“计算”系列中的非空条目:
In [21]: s = pd.Series([1, 2, np.nan, 4])
In [22]: s.count()
Out[22]: 3