遍历组进行操作

时间:2019-03-22 20:35:56

标签: python-3.x pandas

我有如下类似的数据框:-

   WELL RESV TYPE X1    Y1  X2   Y2    TD2
0   W1  A    OP   100   250 500   -5    495
1   W2  B   INJ   120   255 700   -7    695
2   W3  B   OBS   140   260 900   -9    895
3   W4  B   OP    160   265 1100  -11   1095
4   W5  A   OBS   180   270 1300  -13   1295
5   W6  B   INJ   200   275 1500  -15   1495
6   W7  A   OBS   220   280 1700  -17   1695
7   W8  B   INJ   240   285 1900  -19   1895
8   W9  A   OP    260   290 2100  -21   2095

然后,我开始使用“ TYPE”和“ RESV”列的唯一值拆分此数据框。首先,我从TYPE =='OP'和RESV =='A'开始。然后,使用此子数据帧,我将子数据帧重新排列为某种格式,并按如下所示重新排列to_csv。

df= df[(df.TYPE == 'OP') & (df.RESV == 'A')]
df1 = df[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
df2 = df[['WELL', 'X2', 'Y2']]
df2.columns = ['WELL', 'X1', 'Y1']
df = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill').reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
for i, x in df.groupby('WELL'):
    x.to_csv({}, + 'csv')

结果是这样的

   WELL RESV  TYPE  X1   Y1     TD2
0   W1  A     OP    100  250    495.0
1   W1  A     OP    500  -5     495.0
2   W9  A     OP    260  290    2095.0
3   W9  A     OP    2100 -21    2095.0

而不是多次运行此代码,而是每次将TYPE和RESV更改为不同的唯一值

df= df[(df.TYPE == 'OP') & (df.RESV == 'A')]

我真正想要实现的是做一个groupby()  即

df_gb = df.groupby(['TYPE','RESV'])

然后在每个组上进行循环/迭代以像我上面所做的那样进行操作。

我如何结合以下操作使用groupby,一次遍历每个组?

df1 = df[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
df2 = df[['WELL', 'X2', 'Y2']]
df2.columns = ['WELL', 'X1', 'Y1']
df = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill').reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
for i, x in df.groupby('WELL'):
    x.to_csv({}, + 'csv')

2 个答案:

答案 0 :(得分:1)

尝试一下:

for name_grp, df_grp in df.groupby(["TYPE", "RESV"]):
    df1 = df_grp[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
    df2 = df_grp[['WELL', 'X2', 'Y2']]
    df2.columns = ['WELL', 'X1', 'Y1'] 
    df3 = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill')
    df3 = df3.reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
    for i, x in df3.groupby('WELL'):
        x.to_csv(str(i) + '.csv')

答案 1 :(得分:0)

重命名某些列后,可以使用pd.concat()apply()

def reformat(x):

    return pd.concat([x[['WELL','X1','Y1','TD2']], x[['WELL','X2','Y2','TD2']].rename(columns={'X2': 'X1', 'Y2': 'Y1'})], axis=0).sort_values('WELL')

out = df.groupby(['TYPE','RESV']).apply(reformat).reset_index().drop('level_2', axis=1)

收益:

   TYPE RESV WELL    X1   Y1   TD2
0   INJ    B   W2   120  255   695
1   INJ    B   W2   700   -7   695
2   INJ    B   W6   200  275  1495
3   INJ    B   W6  1500  -15  1495
4   INJ    B   W8   240  285  1895
5   INJ    B   W8  1900  -19  1895
6   OBS    A   W5   180  270  1295
7   OBS    A   W5  1300  -13  1295
8   OBS    A   W7   220  280  1695
9   OBS    A   W7  1700  -17  1695
10  OBS    B   W3   140  260   895
11  OBS    B   W3   900   -9   895
12   OP    A   W1   100  250   495
13   OP    A   W1   500   -5   495
14   OP    A   W9   260  290  2095
15   OP    A   W9  2100  -21  2095
16   OP    B   W4   160  265  1095
17   OP    B   W4  1100  -11  1095