Python pandas:agg函数中的case语句

时间:2016-07-07 14:14:02

标签: python numpy pandas dataframe

我有这样的sql语句:

$fp = fopen('errorFile.txt', 'a');
$message = "At the time: " . date("Y,m,d\|H:i:s") . " the following error took place: " . $e->getMessage();
 fseek($fp, 0);
fwrite($fp, $message);
fclose($fp);

我尝试使用Pandas重写: 首先,我将为“内部”表创建数据框:

select id
        , avg(case when rate=1 then rate end) as "P_Rate"
        , stddev(case when rate=1 then rate end) as "std P_Rate",
        , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
        , stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
 select id, connected_date,payment_type,acc_type,
  max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
  sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id

然后我将这个数据分组

filtered_data = data.where(data['alloc_date'] <= analysis_date)

但我必须使用它来过滤每一列并使用max / sum。

我试过这样的事情:

grouped = filtered_data.groupby(['id','connected_date'])

以及类似的费率

1 个答案:

答案 0 :(得分:1)

您应该在问题中添加一些DataFrame,以便更轻松地回答。

根据您的需要,您可能希望使用groupby数据帧的agg方法。假设您有以下数据框:

    connected_date  id      number_of_clicks    time_spent
0   Mon             matt    15                  124
1   Tue             john    13                  986
2   Mon             matt    48                  451
3   Thu             jack    68                  234
4   Sun             john    52                  976
5   Sat             sabrina 13                  156

并且您希望得到用户按天花费的时间总和以及单个会话中的最大点击次数。然后以这种方式使用groupby

df.groupby(['id','connected_date'],as_index = False).agg({'number_of_clicks':max,'time_spent':sum})

输出:

    id      connected_date  time_spent  number_of_clicks
0   jack    Thu             234         68
1   john    Sun             976         52
2   john    Tue             986         13
3   matt    Mon             575         48
4   sabrina Sat             156         13

请注意,为了清晰输出,我只传递了as_index=False