假设我有一个像这样的数据框
import pandas as pd
import numpy as np
#
a = ['a','b']*6
b = ['c','c','d','d']*3
c = np.linspace(1,12,12)
d = np.linspace(2,13,12)
e = np.linspace(3,14,12)
f = np.linspace(4,15,12)
df1 = pd.DataFrame({'A': a, 'B': b, 'C': c, 'D': d, 'E': e, 'F': f})
df2 = df1.drop(columns=['A','B'])
给出了
In [2]: df1
Out[2]:
A B C D E F
0 a c 1.0 2.0 3.0 4.0
1 b c 2.0 3.0 4.0 5.0
2 a d 3.0 4.0 5.0 6.0
3 b d 4.0 5.0 6.0 7.0
4 a c 5.0 6.0 7.0 8.0
5 b c 6.0 7.0 8.0 9.0
6 a d 7.0 8.0 9.0 10.0
7 b d 8.0 9.0 10.0 11.0
8 a c 9.0 10.0 11.0 12.0
9 b c 10.0 11.0 12.0 13.0
10 a d 11.0 12.0 13.0 14.0
11 b d 12.0 13.0 14.0 15.0
我要做的是根据
将功能应用于C到E列我让它与for循环一起工作买它太慢了(实际上数据帧很大)
这是我为了加快速度而做的事情
f1 = lambda x: x - df1['D'] if x > df1['D'] else df1['D'] - x
f2 = lambda x: x + df1['D'] if x > df1['D'] else df1['D'] + x + 10
f3 = lambda x: x - df1['D'] if x > df1['D'] else df1['D'] - x
f4 = lambda x: x + df1['D'] if x > df1['D'] else df1['D'] + x + 5
df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'C':'E'] = df2.apply(f1)
df1.loc[(df1['A'] == 'a') & (df1['B'] == 'd'), 'C':'E'] = df2.apply(f2)
df1.loc[(df1['A'] == 'b') & (df1['B'] == 'c'), 'C':'E'] = df2.apply(f3)
df1.loc[(df1['A'] == 'b') & (df1['B'] == 'd'), 'C':'E'] = df2.apply(f4)
这样我得到了ValueError :(''系列的真值是模糊的'),问题是lambdas定义中的“if”。
然后我尝试了以下
f1 = lambda x: x - df1['D']
f2 = lambda x: x + df1['D']
f3 = lambda x: x - df1['D']
f4 = lambda x: x + df1['D']
np.where(df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'C':'E'] > df1.loc[(df1['A'] == 'a') & (df1['B'] == 'c'), 'D'], df2.apply(f1), df2.apply(f2))
以某种方式将“if”传递给np.where但我得到了ValueError :('操作数不能与形状(3,6)(12,4)(12,4)'一起广播。)
任何帮助都非常感激,因为我的想法已经用完了!
由于
答案 0 :(得分:0)
这个怎么样?灵感来自np.select
(但不能直接使用!)
conditions = [
(df1['A'] == 'a') & (df1['B'] == 'c'),
(df1['A'] == 'a') & (df1['B'] == 'd'),
(df1['A'] == 'b') & (df1['B'] == 'c'),
(df1['A'] == 'b') & (df1['B'] == 'd')
]
choices = [
np.abs(df2.subtract(df2['D'], axis=0)),
(df2.add(df2['D'], axis=0) + df2.gt(df2['D'], axis=0) * 10),
np.abs(df2.subtract(df2['D'], axis=0)),
(df2.add(df2['D'], axis=0) + df2.gt(df2['D'], axis=0) * 5)
]
new_dfs = []
for i in range(len(conditions)):
c = choices[i][conditions[i]]
new_dfs.append(c)
res = pd.concat(new_dfs).sort_index()
print(res)
哪个给出了
C D E F
0 1.0 0.0 1.0 2.0
1 1.0 0.0 1.0 2.0
2 7.0 8.0 19.0 20.0
3 9.0 10.0 16.0 17.0
4 1.0 0.0 1.0 2.0
5 1.0 0.0 1.0 2.0
6 15.0 16.0 27.0 28.0
7 17.0 18.0 24.0 25.0
8 1.0 0.0 1.0 2.0
9 1.0 0.0 1.0 2.0
10 23.0 24.0 35.0 36.0
11 25.0 26.0 32.0 33.0
如果这不正确,您能举例说明您希望输出数据帧在您的示例中吗?
答案 1 :(得分:0)
简化你想要做的是f1的2个班轮(NB abs而不是if-else):
inds_Aa_Bc=df1[(df1["A"]=="a")& (df1["B"]=="c")].index
df1.loc[inds_Aa_Bc,"C"]=abs(df1.loc[inds_Aa_Bc,"C"]-df1.loc[inds_Aa_Bc,"D"])
Line1获取满足您想要的行的索引
Line2对“C”列进行f1操作
不确定“C”到“E”是什么意思,但在cae中只重复这些列的第2行
对于f2,你还需要2行,但有点麻烦:
inds_Aa_Bd=df1[(df1["A"]=="a")& (df1["B"]=="d")].index
df1.loc[inds_Aa_Bd,"C"]=np.where(df1.loc[inds_Aa_Bd,"C"]<=df1.loc[inds_Aa_Bd,"D"],df1.loc[inds_Aa_Bd,"C"]+df1.loc[inds_Aa_Bd,"D"]+10,df1.loc[inds_Aa_Bd,"C"]+df1.loc[inds_Aa_Bd,"D"])
Line1再次获取所需行的索引
Line2使用np.where(条件,值如果condition = True,值如果contition == False)
f3和f4与f1和f2非常相似:
#f3
inds_Ab_Bc=df1[(df1["A"]=="b")& (df1["B"]=="c")].index
df1.loc[inds_Ab_Bc,"C"]=abs(df1.loc[inds_Ab_Bc,"C"]-df1.loc[inds_Ab_Bc,"D"])
#f4
inds_Ab_Bd=df1[(df1["A"]=="b")& (df1["B"]=="d")].index
df1.loc[inds_Ab_Bd,"C"]=np.where(df1.loc[inds_Ab_Bd,"C"]<=df1.loc[inds_Ab_Bd,"D"],df1.loc[inds_Ab_Bd,"C"]+df1.loc[inds_Ab_Bd,"D"]+5,df1.loc[inds_Ab_Bd,"C"]+df1.loc[inds_Ab_Bd,"D"])
运行所有这些返回:
df1
A B C D E F
0 a c 1.0 2.0 3.0 4.0
1 b c 1.0 3.0 4.0 5.0
2 a d 17.0 4.0 5.0 6.0
3 b d 14.0 5.0 6.0 7.0
4 a c 1.0 6.0 7.0 8.0
5 b c 1.0 7.0 8.0 9.0
6 a d 25.0 8.0 9.0 10.0
7 b d 22.0 9.0 10.0 11.0
8 a c 1.0 10.0 11.0 12.0
9 b c 1.0 11.0 12.0 13.0
10 a d 33.0 12.0 13.0 14.0
11 b d 30.0 13.0 14.0 15.0