我有每个组的值的总和。
rf = condition1.groupby(by=['Well Name','Phase'])['Sum of Activity Time
(Hr)'].sum()
Well Name | Phase | Value |
TIGER 55-2-12 LOV 8H | INT | 56.25
| MNH | 58.25
| SRF | 34.25
UNIVERSITY 20 PW 2502H | INT | 52.75
| MNH | 72.50
| SRF | 28.5
UNIVERSITY 20 PW UNIT | INT | 64.50
| MNH | 132.50
| SRF | 30.00
UNIVERSITY 20 TG UNIT | INT | 57.00
| MNH | 129.50
| SRF | 25.50
我需要有这样的东西:正如您所看到的,三个阶段中每个阶段的最小值就是每个组的总和的最小值。有什么想法吗?
Well Name | Phase | Value |
UNIVERSITY 20 PW 2502H | INT | 52.75
TIGER 55-2-12 LOV 8H | MNH | 58.25
UNIVERSITY 20 TG UNIT | SRF | 25.50
如您所见,只是这些组的总和的最小值。
答案 0 :(得分:0)
您可以这样做:
import pandas as pd
import numpy as np
rf=pd.DataFrame(condition1.groupby(by=['Well Name','Phase'])['Sum of Activity Time
(Hr)'].sum())
rf=rf[rf.isin(rf.groupby('Well Name').min()['Sum of Activity Time (Hr)'].tolist())].dropna()
还可以尝试:
import pandas as pd
rf=pd.DataFrame(condition1.groupby(by=['Well Name','Phase'])['Sum of Activity Time
(Hr)'].sum())
i=0
while i<len(rf.index):
if rf.loc[rf.index.values[i]][0] != rf.loc[rf.index.values.tolist()[i][0]].apply(min)[0]:
rf = rf.drop(rf.index.values.tolist()[i])
else:
print(i)
i+=1
rf
输出:
Well Name | Phase | Value |
UNIVERSITY 20 PW 2502H | INT | 52.75
TIGER 55-2-12 LOV 8H | MNH | 58.25
UNIVERSITY 20 TG UNIT | SRF | 25.50