在熊猫中基于多个条件创建多个新列

时间:2019-12-23 05:58:05

标签: python-3.x pandas dataframe

我尝试根据以下数据框获取新列ab

      a_x  b_x    a_y  b_y
0   13.67  0.0  13.67  0.0
1   13.42  0.0  13.42  0.0
2   13.52  1.0  13.17  1.0
3   13.61  1.0  13.11  1.0
4   12.68  1.0  13.06  1.0
5   12.70  1.0  12.93  1.0
6   13.60  1.0    NaN  NaN
7   12.89  1.0    NaN  NaN
8   11.68  1.0    NaN  NaN
9     NaN  NaN   8.87  0.0
10    NaN  NaN   8.77  0.0
11    NaN  NaN   7.97  0.0

如果b_xb_y0.0(在这种情况下,如果它们都存在,则它们具有相同的值),则a_xb_y共享相同的值,因此我将它们中的一个作为新列ab;如果b_xb_y1.0,则它们是不同的值,因此我将a_xa_y的均值计算为a的值,将b_xb_y都设为b

如果a_x, b_xa_y, b_y不为null,那么我将使用现有值ab

我的预期结果将是这样:

      a_x  b_x    a_y  b_y       a  b
0   13.67  0.0  13.67  0.0  13.670  0
1   13.42  0.0  13.42  0.0  13.420  0
2   13.52  1.0  13.17  1.0  13.345  1
3   13.61  1.0  13.11  1.0  13.360  1
4   12.68  1.0  13.06  1.0  12.870  1
5   12.70  1.0  12.93  1.0  12.815  1
6   13.60  1.0    NaN  NaN  13.600  1
7   12.89  1.0    NaN  NaN  12.890  1
8   11.68  1.0    NaN  NaN  11.680  1
9     NaN  NaN   8.87  0.0   8.870  0
10    NaN  NaN   8.77  0.0   8.770  0
11    NaN  NaN   7.97  0.0   7.970  0

如何获得以上结果?谢谢。

1 个答案:

答案 0 :(得分:1)

使用:

#filter all a and b columns 
b = df.filter(like='b')
a = df.filter(like='a')
#test if at least one 0 or 1 value
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)

#get means of a columns
a1 = a.mean(axis=1)
#forward filling mising values and select last column
b1 = b.ffill(axis=1).iloc[:, -1]
a2 = a.ffill(axis=1).iloc[:, -1]

#new Dataframe with 2 conditions
df1 = pd.DataFrame(np.select([m1, m2], [[a2, b1], [a1, b1]]), index=['a','b']).T
#join to original
df = df.join(df1)
print (df)
      a_x  b_x    a_y  b_y       a    b
0   13.67  0.0  13.67  0.0  13.670  0.0
1   13.42  0.0  13.42  0.0  13.420  0.0
2   13.52  1.0  13.17  1.0  13.345  1.0
3   13.61  1.0  13.11  1.0  13.360  1.0
4   12.68  1.0  13.06  1.0  12.870  1.0
5   12.70  1.0  12.93  1.0  12.815  1.0
6   13.60  1.0    NaN  NaN  13.600  1.0
7   12.89  1.0    NaN  NaN  12.890  1.0
8   11.68  1.0    NaN  NaN  11.680  1.0
9     NaN  NaN   8.87  0.0   8.870  0.0
10    NaN  NaN   8.77  0.0   8.770  0.0
11    NaN  NaN   7.97  0.0   7.970  0.0

但是我认为解决方案应该简化,因为均值应同时用于两个条件(因为相同值的均值与第一个值相同)

b = df.filter(like='b')
a = df.filter(like='a')
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)

a1 = a.mean(axis=1)
b1 = b.ffill(axis=1).iloc[:, -1]


df['a'] = a1
df['b'] = b1
print (df)
      a_x  b_x    a_y  b_y       a    b
0   13.67  0.0  13.67  0.0  13.670  0.0
1   13.42  0.0  13.42  0.0  13.420  0.0
2   13.52  1.0  13.17  1.0  13.345  1.0
3   13.61  1.0  13.11  1.0  13.360  1.0
4   12.68  1.0  13.06  1.0  12.870  1.0
5   12.70  1.0  12.93  1.0  12.815  1.0
6   13.60  1.0    NaN  NaN  13.600  1.0
7   12.89  1.0    NaN  NaN  12.890  1.0
8   11.68  1.0    NaN  NaN  11.680  1.0
9     NaN  NaN   8.87  0.0   8.870  0.0
10    NaN  NaN   8.77  0.0   8.770  0.0
11    NaN  NaN   7.97  0.0   7.970  0.0