我有以下数据框df:
| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1 | 3/29/2016 | 4/23/2016 |
| 1 | 3/29/2016 | 3/29/2016 |
| 1 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 |
| 2 | 5/15/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 |
| 3 | 7/24/2016 | 4/21/2016 |
我想通过“Staff_ID”得到Min和Max“Time_Stamp Date”分区,以便得到的数据帧如下:
| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1 | 3/29/2016 | 4/23/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 3/29/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 6/21/2016 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 | 4/1/2016 | 5/25/2016 |
| 2 | 5/15/2016 | 5/25/2016 | 4/1/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 4/21/2016 | 4/21/2016 | 6/21/2016 |
我怎样才能在Python中执行此操作?
答案 0 :(得分:0)
您可以使用groupby
然后合并结果:
group = df.groupby("Staff_ID", as_index=False)["Time_Stamp"]
df = pd.merge(df, group.min(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y": "Min_Time_Stamp"})
df = pd.merge(df, group.max(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y" : "Max_Time_Stamp"})
结果:
Join_Date Staff_ID Time_Stamp Min_Time_Stamp Max_Time_Stamp
0 3/29/2016 1 4/23/2016 3/29/2016 6/21/2016
1 3/29/2016 1 3/29/2016 3/29/2016 6/21/2016
2 3/29/2016 1 6/21/2016 3/29/2016 6/21/2016
3 5/15/2016 2 4/1/2016 4/1/2016 5/25/2016
4 5/15/2016 2 5/25/2016 4/1/2016 5/25/2016
5 7/24/2016 3 6/21/2016 4/21/2016 6/21/2016
6 7/24/2016 3 6/10/2016 4/21/2016 6/21/2016
7 7/24/2016 3 4/21/2016 4/21/2016 6/21/2016
答案 1 :(得分:0)
我们将groupby
与transform
和assign
:
g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))
输出:
Staff_ID Join_Date Time_Stamp Max_Time_Stamp Min_Time_Stamp
1 1 3/29/2016 4/23/2016 6/21/2016 3/29/2016
2 1 3/29/2016 3/29/2016 6/21/2016 3/29/2016
3 1 3/29/2016 6/21/2016 6/21/2016 3/29/2016
4 2 5/15/2016 4/1/2016 5/25/2016 4/1/2016
5 2 5/15/2016 5/25/2016 5/25/2016 4/1/2016
6 3 7/24/2016 6/21/2016 6/21/2016 4/21/2016
7 3 7/24/2016 6/10/2016 6/21/2016 4/21/2016
8 3 7/24/2016 4/21/2016 6/21/2016 4/21/2016
@CarlesMitjans方法:
10个循环,最佳3:33.3毫秒/循环
@ScottBoston方法:
100个循环,最佳3:每循环5.52 ms