Python分区依据

时间:2017-11-16 08:19:03

标签: python group-by timestamp partition

我有以下数据框df:

| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1        | 3/29/2016 | 4/23/2016  |
| 1        | 3/29/2016 | 3/29/2016  |
| 1        | 3/29/2016 | 6/21/2016  |
| 2        | 5/15/2016 | 4/1/2016   |
| 2        | 5/15/2016 | 5/25/2016  |
| 3        | 7/24/2016 | 6/21/2016  |
| 3        | 7/24/2016 | 6/10/2016  |
| 3        | 7/24/2016 | 4/21/2016  |

我想通过“Staff_ID”得到Min和Max“Time_Stamp Date”分区,以便得到的数据帧如下:

| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1        | 3/29/2016 | 4/23/2016  | 3/29/2016      | 6/21/2016      |
| 1        | 3/29/2016 | 3/29/2016  | 3/29/2016      | 6/21/2016      |
| 1        | 3/29/2016 | 6/21/2016  | 3/29/2016      | 6/21/2016      |
| 2        | 5/15/2016 | 4/1/2016   | 4/1/2016       | 5/25/2016      |
| 2        | 5/15/2016 | 5/25/2016  | 4/1/2016       | 5/25/2016      |
| 3        | 7/24/2016 | 6/21/2016  | 4/21/2016      | 6/21/2016      |
| 3        | 7/24/2016 | 6/10/2016  | 4/21/2016      | 6/21/2016      |
| 3        | 7/24/2016 | 4/21/2016  | 4/21/2016      | 6/21/2016      |

我怎样才能在Python中执行此操作?

2 个答案:

答案 0 :(得分:0)

您可以使用groupby然后合并结果:

group = df.groupby("Staff_ID", as_index=False)["Time_Stamp"]
df = pd.merge(df, group.min(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y": "Min_Time_Stamp"})
df = pd.merge(df, group.max(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y" : "Max_Time_Stamp"})

结果:

   Join_Date  Staff_ID Time_Stamp Min_Time_Stamp Max_Time_Stamp
0  3/29/2016         1  4/23/2016      3/29/2016      6/21/2016
1  3/29/2016         1  3/29/2016      3/29/2016      6/21/2016
2  3/29/2016         1  6/21/2016      3/29/2016      6/21/2016
3  5/15/2016         2   4/1/2016       4/1/2016      5/25/2016
4  5/15/2016         2  5/25/2016       4/1/2016      5/25/2016
5  7/24/2016         3  6/21/2016      4/21/2016      6/21/2016
6  7/24/2016         3  6/10/2016      4/21/2016      6/21/2016
7  7/24/2016         3  4/21/2016      4/21/2016      6/21/2016

答案 1 :(得分:0)

我们将groupbytransformassign

一起使用
g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))

输出:

     Staff_ID    Join_Date    Time_Stamp Max_Time_Stamp Min_Time_Stamp
1   1           3/29/2016    4/23/2016      6/21/2016      3/29/2016  
2   1           3/29/2016    3/29/2016      6/21/2016      3/29/2016  
3   1           3/29/2016    6/21/2016      6/21/2016      3/29/2016  
4   2           5/15/2016    4/1/2016       5/25/2016      4/1/2016   
5   2           5/15/2016    5/25/2016      5/25/2016      4/1/2016   
6   3           7/24/2016    6/21/2016      6/21/2016      4/21/2016  
7   3           7/24/2016    6/10/2016      6/21/2016      4/21/2016  
8   3           7/24/2016    4/21/2016      6/21/2016      4/21/2016  

时序:

@CarlesMitjans方法:

  

10个循环,最佳3:33.3毫秒/循环

@ScottBoston方法:

  

100个循环,最佳3:每循环5.52 ms