我有如下类似的数据框:-
WELL RESV TYPE X1 Y1 X2 Y2 TD2
0 W1 A OP 100 250 500 -5 495
1 W2 B INJ 120 255 700 -7 695
2 W3 B OBS 140 260 900 -9 895
3 W4 B OP 160 265 1100 -11 1095
4 W5 A OBS 180 270 1300 -13 1295
5 W6 B INJ 200 275 1500 -15 1495
6 W7 A OBS 220 280 1700 -17 1695
7 W8 B INJ 240 285 1900 -19 1895
8 W9 A OP 260 290 2100 -21 2095
然后,我开始使用“ TYPE”和“ RESV”列的唯一值拆分此数据框。首先,我从TYPE =='OP'和RESV =='A'开始。然后,使用此子数据帧,我将子数据帧重新排列为某种格式,并按如下所示重新排列to_csv。
df= df[(df.TYPE == 'OP') & (df.RESV == 'A')]
df1 = df[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
df2 = df[['WELL', 'X2', 'Y2']]
df2.columns = ['WELL', 'X1', 'Y1']
df = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill').reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
for i, x in df.groupby('WELL'):
x.to_csv({}, + 'csv')
结果是这样的
WELL RESV TYPE X1 Y1 TD2
0 W1 A OP 100 250 495.0
1 W1 A OP 500 -5 495.0
2 W9 A OP 260 290 2095.0
3 W9 A OP 2100 -21 2095.0
而不是多次运行此代码,而是每次将TYPE和RESV更改为不同的唯一值
df= df[(df.TYPE == 'OP') & (df.RESV == 'A')]
我真正想要实现的是做一个groupby() 即
df_gb = df.groupby(['TYPE','RESV'])
然后在每个组上进行循环/迭代以像我上面所做的那样进行操作。
我如何结合以下操作使用groupby,一次遍历每个组?
df1 = df[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
df2 = df[['WELL', 'X2', 'Y2']]
df2.columns = ['WELL', 'X1', 'Y1']
df = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill').reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
for i, x in df.groupby('WELL'):
x.to_csv({}, + 'csv')
答案 0 :(得分:1)
尝试一下:
for name_grp, df_grp in df.groupby(["TYPE", "RESV"]):
df1 = df_grp[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
df2 = df_grp[['WELL', 'X2', 'Y2']]
df2.columns = ['WELL', 'X1', 'Y1']
df3 = pd.concat([df1, df2], sort=True).sort_values(['WELL', 'TD2']).fillna(method='ffill')
df3 = df3.reset_index(drop = True)[['WELL', 'RESV', 'TYPE', 'X1', 'Y1', 'TD2']]
for i, x in df3.groupby('WELL'):
x.to_csv(str(i) + '.csv')
答案 1 :(得分:0)
重命名某些列后,可以使用pd.concat()
和apply()
:
def reformat(x):
return pd.concat([x[['WELL','X1','Y1','TD2']], x[['WELL','X2','Y2','TD2']].rename(columns={'X2': 'X1', 'Y2': 'Y1'})], axis=0).sort_values('WELL')
out = df.groupby(['TYPE','RESV']).apply(reformat).reset_index().drop('level_2', axis=1)
收益:
TYPE RESV WELL X1 Y1 TD2
0 INJ B W2 120 255 695
1 INJ B W2 700 -7 695
2 INJ B W6 200 275 1495
3 INJ B W6 1500 -15 1495
4 INJ B W8 240 285 1895
5 INJ B W8 1900 -19 1895
6 OBS A W5 180 270 1295
7 OBS A W5 1300 -13 1295
8 OBS A W7 220 280 1695
9 OBS A W7 1700 -17 1695
10 OBS B W3 140 260 895
11 OBS B W3 900 -9 895
12 OP A W1 100 250 495
13 OP A W1 500 -5 495
14 OP A W9 260 290 2095
15 OP A W9 2100 -21 2095
16 OP B W4 160 265 1095
17 OP B W4 1100 -11 1095