在两个熊猫数据框之间映射列值

时间:2020-03-21 03:39:05

标签: python-3.x pandas

我有两个数据框,如下所示:

df1 

    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1       1       0       0        0       0
1   A       0       2       0       0        0       0
2   A       0       3       0       0        0       0
3   B       1       7       0       0        0       0
4   B       0       8       0       0        0       0
5   B       0       9       0       0        0       0
6   B       0       10      0       0        0       0
7   B       1       15      0       0        0       0
8   B       0       20      0       0        0       0
9   B       0       30      0       0        0       0

df2

val group   AA_new  AB_new  B1_new  B2_new
0     A     40      500     0        0
2     B     0       0       700      60

我想基于列“ group”在df1中映射df2,其中df1中的“标志” = 1。

我期望的最终数据帧:

    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1       1       40      500      0       0
1   A       0       2       0       0        0       0
2   A       0       3       0       0        0       0
3   B       1       7       0       0        700     60
4   B       0       8       0       0        0       0
5   B       0       9       0       0        0       0
6   B       0       10      0       0        0       0
7   B       1       15      0       0        700     600
8   B       0       20      0       0        0       0
9   B       0       30      0       0        0       0

3 个答案:

答案 0 :(得分:0)

这是使用mergeconcat的一种方法:

c = df1['flag'].astype(bool) #condition where flag is 1
m = df1.reset_index()  #for retaining index later
out = (pd.concat((m[c].merge(df2,on='group',suffixes=('_x',''))[m.columns],
                  m[~c])).set_index('index')
                 .sort_index().rename_axis(None))

print(out)

  group  flag  var1  AA_new  AB_new  B1_new  B2_new
0     A     1     1      40     500       0       0
1     A     0     2       0       0       0       0
2     A     0     3       0       0       0       0
3     B     1     7       0       0     700      60
4     B     0     8       0       0       0       0
5     B     0     9       0       0       0       0
6     B     0    10       0       0       0       0
7     B     1    15       0       0     700      60
8     B     0    20       0       0       0       0
9     B     0    30       0       0       0       0

答案 1 :(得分:0)

请在下面查看我的尝试; 条件

a=(df1['flag']==1)& (df1['group'].str.contains('A'))
b=(df1['flag']==1)& (df1['group'].str.contains('B'))

使用np.where应用条件

df1['AA_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AA_new'], 0))
df1['AB_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AB_new'], 0))
df1['B1_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B1_new'], 0))
df1['B2_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B2_new'], 0))

输出

enter image description here

答案 2 :(得分:0)

使用fillna的另一种解决方案:

   import numpy as np

   #create a variable to house columns that end with 'new'
   col = df1.columns[df1.columns.str.endswith('new')]

   #set values in col list to null if flag is 1
   df1.loc[df1.flag.eq(1),col]= np.nan

   #set index to group for both df1 and df2
   #this allows fillna to correctly fill the null values based on the index
   #use fillna to replace the null values in df1 with values from df2
   df1.set_index('group').fillna(df2.set_index('group'))

        flag    var1    AA_new  AB_new  B1_new  B2_new
group                       
A        1       1      40.0    500.0    0.0    0.0
A        0       2       0.0    0.0      0.0    0.0
A        0       3       0.0    0.0      0.0    0.0
B        1       7       0.0    0.0      700.0  60.0
B        0       8       0.0    0.0      0.0    0.0
B        0       9       0.0    0.0      0.0    0.0
B        0       10      0.0    0.0      0.0    0.0
B        1       15      0.0    0.0      700.0  60.0
B        0       20      0.0    0.0      0.0    0.0
B        0       30      0.0    0.0      0.0    0.0