我有两个熊猫数据框
import pandas as pd
import numpy as np
import datetime
data = {'group' :["A","A","B","B"],
'val': ["AA","AB","B1","B2"],
'cal1' :[4,5,7,6],
'cal2' :[10,100,100,10]
}
df1 = pd.DataFrame(data)
df1
group val cal1 cal2
0 A AA 4 10
1 A AB 5 100
2 B B1 7 100
3 B B2 6 10
data = {'group' :["A","A","A","B","B","B","B", "B", "B", "B"],
'flag' : [1,0,0,1,0,0,0, 1, 0, 0],
'var1': [1,2,3,7,8,9,10, 15, 20, 30]
}
# Create DataFrame
df2 = pd.DataFrame(data)
df2
group flag var1
0 A 1 1
1 A 0 2
2 A 0 3
3 B 1 7
4 B 0 8
5 B 0 9
6 B 0 10
7 B 1 15
8 B 0 20
9 B 0 30
Step 1: CReate columns in df2(with suffix "_new") based on unique "val" in df1 like below:
unique_val = df1['val'].unique().tolist()
new_cols = [t + '_new' for t in unique_val]
for i in new_cols:
df2[i] = 0
df2
group flag var1 AA_new AB_new B1_new B2_new
0 A 1 1 0 0 0 0
1 A 0 2 0 0 0 0
2 A 0 3 0 0 0 0
3 B 1 7 0 0 0 0
4 B 0 8 0 0 0 0
5 B 0 9 0 0 0 0
6 B 0 10 0 0 0 0
7 B 1 15 0 0 0 0
8 B 0 20 0 0 0 0
9 B 0 30 0 0 0 0
第2步:对于标志= 1的行,AA_new将计算为var1(来自df2)*来自df1组“ A”的val1的“ cal1”的值和val“ AA” *来自df1的cal2组“ A”和值“ AA”,类似地将AB_new计算为var1(来自df2)*组“ A”的df1中的“ cal1”值和val“ AB” *值“组”的df1中的“ cal2”值A”和值“ AB”
我的预期输出应如下所示:
group flag var1 AA_new AB_new B1_new B2_new
0 A 1 1 40.0 500.0 0.0 0.0
1 A 0 2 0.0 0.0 0.0 0.0
2 A 0 3 0.0 0.0 0.0 0.0
3 B 1 7 0.0 0.0 4900.0 420.0
4 B 0 8 0.0 0.0 0.0 0.0
5 B 0 9 0.0 0.0 0.0 0.0
6 B 0 10 0.0 0.0 0.0 0.0
7 B 1 15 0.0 0.0 10500.0 900.0
8 B 0 20 0.0 0.0 0.0 0.0
9 B 0 30 0.0 0.0 0.0 0.0
以下基于其他堆栈流问题的解决方案部分起作用:
df2.assign(**df1.assign(mul_cal = df1['cal1'].mul(df1['cal2']))
.pivot_table(columns='val',
values='mul_cal',
index = ['group', df2.index])
.add_suffix('_new')
.groupby(level=0)
.apply(lambda x: x.bfill().ffill())
.reset_index(level='group',drop='group')
.fillna(0)
.mul(df2['var1'], axis=0)
.where(df2['flag'].eq(1), 0)
)
答案 0 :(得分:1)
灵活列
如果您希望当我们在df1中再添加几行时这样做,您可以执行此操作。
combinations = df1.groupby(['group','val'])['cal3'].sum().reset_index()
for index_, row_ in combinations.iterrows():
for index, row in df2.iterrows():
if row['flag'] == 1:
if row['group'] == row_['group']:
df2.loc[index, row_['val'] + '_new'] = row['var1'] * df1[(df1['group'] == row_['group']) & (df1['val'] == row_['val'])]['cal3'].values[0]
硬编码
您可以使用迭代来对数据框进行设置并在每次迭代中更改其特定的列,您可以执行类似的操作(但您需要先在df1
中添加新列)。
df1['cal3'] = df1['cal1'] * df1['cal2']
for index, row in df2.iterrows():
if row['flag'] == 1:
if row['group'] == 'A':
df2.loc[index, 'AA_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AA')]['cal3'].values[0]
df2.loc[index, 'AB_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AB')]['cal3'].values[0]
elif row['group'] == 'B':
df2.loc[index, 'B1_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B1')]['cal3'].values[0]
df2.loc[index, 'B2_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B2')]['cal3'].values[0]
这是我得到的结果。