将函数应用于具有多个条件语句的dataframe列

时间:2017-12-22 11:36:08

标签: python-3.x function pandas multiple-columns apply

假设我有一个像这样的数据框

import pandas as pd
import numpy as np

#
a = ['a','b']*6
b = ['c','c','d','d']*3
c = np.linspace(1,12,12)
d = np.linspace(2,13,12)
e = np.linspace(3,14,12)
f = np.linspace(4,15,12)

df1 = pd.DataFrame({'A': a, 'B': b, 'C': c, 'D': d, 'E': e, 'F': f})
df2 = df1.drop(columns=['A','B'])

给出了

In [2]: df1
Out[2]:
    A  B     C     D     E     F
0   a  c   1.0   2.0   3.0   4.0
1   b  c   2.0   3.0   4.0   5.0
2   a  d   3.0   4.0   5.0   6.0
3   b  d   4.0   5.0   6.0   7.0
4   a  c   5.0   6.0   7.0   8.0
5   b  c   6.0   7.0   8.0   9.0
6   a  d   7.0   8.0   9.0  10.0
7   b  d   8.0   9.0  10.0  11.0
8   a  c   9.0  10.0  11.0  12.0
9   b  c  10.0  11.0  12.0  13.0
10  a  d  11.0  12.0  13.0  14.0
11  b  d  12.0  13.0  14.0  15.0

我要做的是根据

将功能应用于C到E列
  • A和B中的值
  • C到E中的值如何与D
  • 进行比较

我让它与for循环一起工作买它太慢了(实际上数据帧很大)

这是我为了加快速度而做的事情

f1 = lambda x: x - df1['D'] if x > df1['D'] else df1['D'] - x
f2 = lambda x: x + df1['D'] if x > df1['D'] else df1['D'] + x + 10
f3 = lambda x: x - df1['D'] if x > df1['D'] else df1['D'] - x
f4 = lambda x: x + df1['D'] if x > df1['D'] else df1['D'] + x + 5

df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'C':'E'] = df2.apply(f1)
df1.loc[(df1['A'] == 'a') & (df1['B'] == 'd'), 'C':'E'] = df2.apply(f2)
df1.loc[(df1['A'] == 'b') & (df1['B'] == 'c'), 'C':'E'] = df2.apply(f3)
df1.loc[(df1['A'] == 'b') & (df1['B'] == 'd'), 'C':'E'] = df2.apply(f4) 

这样我得到了ValueError :(''系列的真值是模糊的'),问题是lambdas定义中的“if”。

然后我尝试了以下

f1 = lambda x: x - df1['D']
f2 = lambda x: x + df1['D']
f3 = lambda x: x - df1['D']
f4 = lambda x: x + df1['D'] 

np.where(df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'C':'E'] > df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'D'], df2.apply(f1), df2.apply(f2))

以某种方式将“if”传递给np.where但我得到了ValueError :('操作数不能与形状(3,6)(12,4)(12,4)'一起广播。)

任何帮助都非常感激,因为我的想法已经用完了!

由于

2 个答案:

答案 0 :(得分:0)

这个怎么样?灵感来自np.select(但不能直接使用!)

conditions = [
    (df1['A'] == 'a') & (df1['B'] == 'c'), 
    (df1['A'] == 'a') & (df1['B'] == 'd'), 
    (df1['A'] == 'b') & (df1['B'] == 'c'), 
    (df1['A'] == 'b') & (df1['B'] == 'd')
]

choices = [
    np.abs(df2.subtract(df2['D'], axis=0)),
    (df2.add(df2['D'], axis=0) + df2.gt(df2['D'], axis=0) * 10),
    np.abs(df2.subtract(df2['D'], axis=0)),
    (df2.add(df2['D'], axis=0) + df2.gt(df2['D'], axis=0) * 5)
]

new_dfs = []
for i in range(len(conditions)):
    c = choices[i][conditions[i]]
    new_dfs.append(c)

res = pd.concat(new_dfs).sort_index()
print(res)

哪个给出了

       C     D     E     F
0    1.0   0.0   1.0   2.0
1    1.0   0.0   1.0   2.0
2    7.0   8.0  19.0  20.0
3    9.0  10.0  16.0  17.0
4    1.0   0.0   1.0   2.0
5    1.0   0.0   1.0   2.0
6   15.0  16.0  27.0  28.0
7   17.0  18.0  24.0  25.0
8    1.0   0.0   1.0   2.0
9    1.0   0.0   1.0   2.0
10  23.0  24.0  35.0  36.0
11  25.0  26.0  32.0  33.0

如果这不正确,您能举例说明您希望输出数据帧在您的示例中吗?

答案 1 :(得分:0)

简化你想要做的是f1的2个班轮(NB abs而不是if-else):

inds_Aa_Bc=df1[(df1["A"]=="a")& (df1["B"]=="c")].index
df1.loc[inds_Aa_Bc,"C"]=abs(df1.loc[inds_Aa_Bc,"C"]-df1.loc[inds_Aa_Bc,"D"])

Line1获取满足您想要的行的索引

Line2对“C”列进行f1操作

不确定“C”到“E”是什么意思,但在cae中只重复这些列的第2行

对于f2,你还需要2行,但有点麻烦:

inds_Aa_Bd=df1[(df1["A"]=="a")& (df1["B"]=="d")].index

df1.loc[inds_Aa_Bd,"C"]=np.where(df1.loc[inds_Aa_Bd,"C"]<=df1.loc[inds_Aa_Bd,"D"],df1.loc[inds_Aa_Bd,"C"]+df1.loc[inds_Aa_Bd,"D"]+10,df1.loc[inds_Aa_Bd,"C"]+df1.loc[inds_Aa_Bd,"D"])

Line1再次获取所需行的索引

Line2使用np.where(条件,值如果condition = True,值如果contition == False)

f3和f4与f1和f2非常相似:

#f3
inds_Ab_Bc=df1[(df1["A"]=="b")& (df1["B"]=="c")].index
df1.loc[inds_Ab_Bc,"C"]=abs(df1.loc[inds_Ab_Bc,"C"]-df1.loc[inds_Ab_Bc,"D"])

#f4
inds_Ab_Bd=df1[(df1["A"]=="b")& (df1["B"]=="d")].index

df1.loc[inds_Ab_Bd,"C"]=np.where(df1.loc[inds_Ab_Bd,"C"]<=df1.loc[inds_Ab_Bd,"D"],df1.loc[inds_Ab_Bd,"C"]+df1.loc[inds_Ab_Bd,"D"]+5,df1.loc[inds_Ab_Bd,"C"]+df1.loc[inds_Ab_Bd,"D"])

运行所有这些返回:

df1
    A  B     C     D     E     F
0   a  c   1.0   2.0   3.0   4.0
1   b  c   1.0   3.0   4.0   5.0
2   a  d  17.0   4.0   5.0   6.0
3   b  d  14.0   5.0   6.0   7.0
4   a  c   1.0   6.0   7.0   8.0
5   b  c   1.0   7.0   8.0   9.0
6   a  d  25.0   8.0   9.0  10.0
7   b  d  22.0   9.0  10.0  11.0
8   a  c   1.0  10.0  11.0  12.0
9   b  c   1.0  11.0  12.0  13.0
10  a  d  33.0  12.0  13.0  14.0
11  b  d  30.0  13.0  14.0  15.0