Python Pandas:使用groupby.aggregate获取列的非空值的计数

时间:2016-05-24 02:16:14

标签: python-2.7 pandas

以下是我的大型数据框的一小部分示例

    Txn_Key Send_Agent           Send_Time            Pay_Time  Send_Amount  \
0         NaN  ANO080012 2012-05-31 02:25:00 2012-05-31 21:43:00       490.00
1         NaN  AUK359401 2012-05-31 11:25:00 2012-05-31 11:57:00       616.16
2         NaN  ACL000105 2012-05-31 13:07:00 2012-05-31 17:36:00       193.78
3         NaN  AED420319 2012-05-31 10:50:00 2012-05-31 11:34:00       999.43
4         NaN  ARA030210 2012-05-30 12:14:00 2012-05-31 04:16:00       433.29
5         NaN  AJ5020114 2012-05-31 02:37:00 2012-05-31 04:31:00       378.00
6         NaN  A11171047 2012-05-31 09:39:00 2012-05-31 10:08:00       865.34
      Pay_Amount        MTCN      Send_Phone  Refund_Flag       time_diff
0         475.68  9323625903        97549829          NaN 0 days 19:18:00
1         600.87  3545067820    440000000000          NaN 0 days 00:32:00
2         185.21  1453132764            0511          NaN 0 days 04:29:00
3         963.04  4509062067    971566016900          NaN 0 days 00:44:00
4         423.75  6898279087             144          NaN 0 days 16:02:00
5         377.99  5170985243    963954932506          NaN 0 days 01:54:00
6         833.89  5352719100      0644798854          NaN 0 days 00:29:00



grouped = frame1.groupby('Send_Agent')


x=grouped.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique()})

我想知道如何使用上面提到的groupby.agg获取Refund_Flag的非空计数。

尝试使用像

这样的lambda
   'Refund_Flag':lambda x:pd.count(x.notnull())

返回错误:        AttributeError:'module'对象没有属性'count'

1 个答案:

答案 0 :(得分:3)

您可以使用count系列方法:

In [11]: g.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique(), "Refund_Flag": lambda x: x.count()})
Out[11]:
            Refund_Flag  MTCN  Send_Phone  Pay_Amount        time_diff  Send_Amount
Send_Agent
A11171047           0.0     1           1      833.89  0 days 00:29:00       865.34
ACL000105           0.0     1           1      185.21  0 days 04:29:00       193.78
AED420319           0.0     1           1      963.04  0 days 00:44:00       999.43
AJ5020114           0.0     1           1      377.99  0 days 01:54:00       378.00
ANO080012           0.0     1           1      475.68  0 days 19:18:00       490.00
ARA030210           0.0     1           1      423.75  0 days 16:02:00       433.29
AUK359401           0.0     1           1      600.87  0 days 00:32:00       616.16

您还可以传递字符串'count'(如果可能,它会生成一个Int列)。

In [12]: g.agg({'Send_Amount':np.mean,'Pay_Amount':np.mean,'time_diff':np.min,'MTCN':np.size,'Send_Phone':lambda x:x.nunique(), "Refund_Flag": 'count'})
Out[12]:
            Refund_Flag  MTCN  Send_Phone  Pay_Amount        time_diff  Send_Amount
Send_Agent
A11171047             0     1           1      833.89  0 days 00:29:00       865.34
ACL000105             0     1           1      185.21  0 days 04:29:00       193.78
AED420319             0     1           1      963.04  0 days 00:44:00       999.43
AJ5020114             0     1           1      377.99  0 days 01:54:00       378.00
ANO080012             0     1           1      475.68  0 days 19:18:00       490.00
ARA030210             0     1           1      423.75  0 days 16:02:00       433.29
AUK359401             0     1           1      600.87  0 days 00:32:00       616.16

注意:你可以对'mean'和'min'等做同样的事情。

In [13]: g.agg({'Send_Amount': 'mean','Pay_Amount': 'mean' ,'time_diff': 'min','MTCN': 'size', 'Send_Phone': 'nunique', "Refund_Flag": 'count'})

count方法“计算”系列中的非空条目:

In [21]: s = pd.Series([1, 2, np.nan, 4])

In [22]: s.count()
Out[22]: 3