我是Python-Pandas的新手。 我有样本数据集,例如
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
P1 West UK M1 Mon_1 200
P1 West UK M2 Mon_1 150
P1 East JAPAN M1 Mon_1 100
P1 East JAPAN M2 Mon_1 100
P1 West UK M1 Mon_2 300
P1 West UK M2 Mon_2 450
P1 East JAPAN M1 Mon_2 500
P1 East JAPAN M2 Mon_2 600
我想要以下数据:
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
P1 West UK M1 Mon_1 200
P1 West UK M2 Mon_1 150
P1 West UK NEW_M Mon_1 350
P1 East JAPAN M1 Mon_1 100
P1 East JAPAN M2 Mon_1 100
P1 East JAPAN NEW_M Mon_1 200
P1 West UK M1 Mon_2 300
P1 West UK M2 Mon_2 450
P1 West UK NEW_M Mon_2 750
P1 East JAPAN M1 Mon_2 500
P1 East JAPAN M2 Mon_2 600
P1 East JAPAN NEW_M Mon_2 1100
我想将(PRODUCT, REGION, COUNTRY, Month_ID)
列与SUM(QTY)
分组。
在每个组之后,将新的行添加到列MEASURE
为NEW_M
的地方。
答案 0 :(得分:3)
您可以通过聚合sum
创建新的DataFrame,然后为了正确排序,使用DataFrame.set_index
添加最后重复的索引,因此在concat
之后,在每行之后为新行添加DataFrame.sort_index
组:
cols = ['PRODUCT', 'REGION', 'COUNTRY', 'Month_ID']
idx = df.index[df.duplicated(cols)]
df1 = (df.groupby(cols, as_index=False, sort=False)['QTY']
.sum()
.assign(MEASURE = 'NEW_M')
.set_index(idx))
df = pd.concat([df, df1], sort=False).sort_index(kind='mergesort').reset_index(drop=True)
print (df)
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
0 P1 West UK M1 Mon_1 200
1 P1 West UK M2 Mon_1 150
2 P1 West UK NEW_M Mon_1 350
3 P1 East JAPAN M1 Mon_1 100
4 P1 East JAPAN M2 Mon_1 100
5 P1 East JAPAN NEW_M Mon_1 200
6 P1 West UK M1 Mon_2 300
7 P1 West UK M2 Mon_2 450
8 P1 West UK NEW_M Mon_2 750
9 P1 East JAPAN M1 Mon_2 500
10 P1 East JAPAN M2 Mon_2 600
11 P1 East JAPAN NEW_M Mon_2 1100
编辑:用于减法的小技巧-QTY
中M2
与MEASURE
的值乘以-1
,因此,如果汇总sum
会有所不同:
#if need only `M1` and `M2` rows
df = df[df['MEASURE'].isin(['M1','M2'])]
cols = ['PRODUCT', 'REGION', 'COUNTRY', 'Month_ID']
idx = df.index[df.duplicated(cols)]
df1 = (df.assign(QTY=df['QTY'].mask(df['MEASURE'].eq('M2'),df['QTY'] * -1))
.groupby(cols, as_index=False, sort=False)['QTY']
.sum()
.assign(MEASURE = 'NEW_M')
.set_index(idx)
)
df2 = pd.concat([df, df1], sort=False).sort_index(kind='mergesort').reset_index(drop=True)
print (df2)
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
0 P1 West UK M1 Mon_1 200
1 P1 West UK M2 Mon_1 150
2 P1 West UK NEW_M Mon_1 50
3 P1 East JAPAN M1 Mon_1 100
4 P1 East JAPAN M2 Mon_1 100
5 P1 East JAPAN NEW_M Mon_1 0
6 P1 West UK M1 Mon_2 300
7 P1 West UK M2 Mon_2 450
8 P1 West UK NEW_M Mon_2 -150
9 P1 East JAPAN M1 Mon_2 500
10 P1 East JAPAN M2 Mon_2 600
11 P1 East JAPAN NEW_M Mon_2 -100