我正在尝试在Pandas DataFrame中创建2个计算字段。结构如下:
Index aa aw ba bw wv a_total b_total
1 0 0 141 0 0
2 0 45.12 0 0 90.50
3 0 0 0 2857 893
我正在尝试创建两个计算列(a_total和b_total),以计算每一行数据框的列。我需要由列的值和下面列出的if逻辑确定输出。
def calc_b():
if wv == 0:
return ba
if wv>0 and (aw+bw)<wv:
return ba
if wv>0 and (aw+bw)>wv and (bw>wv):
return ba+bw-wv
if wv>0 and (aw+bw)>wv and (bw<wv):
return ba
def calc_a():
if wv == 0:
return aa
if wv>0 and (aw+bw)<wv:
return aa
if wv>0 and (aw+bw)>wv and (bw>wv):
return aa+aw
if wv>0 and (aw+bw)>wv and (bw<wv):
return aa+aw-abs(bw-wv)
在上面提供的示例数据中,输出列为:
Index aa aw ba bw wv a_total b_total
1 0 0 141 0 0 0 141
2 0 45.12 0 0 90.50 0 0
3 0 0 0 2857 893 0 1964
我也尝试过使用if / elif语句并以布尔值定义每个结果。我遇到的问题是,一旦确定了其中一行,就会将该计算应用于整个数据框。
只想看看我在这里可能会缺少什么。
谢谢!
答案 0 :(得分:0)
您并不太容易理解该功能应该执行的操作,因此我假定了大部分功能并解决了我发现的问题。首先,要小心标识,这在Python中确实很重要。
第二个,wv,ba,bw,aa和aw变量没有在函数中声明(至少在您所展示的范围内),因此我在列中将每个变量都赋予了一个值,通过数据框索引进行迭代获得的结果,分别设置了最后两列中每个单元格的值。
如果我认为一切正确,那么这个小家伙就可以做到:
import pandas as pd
import numpy as np
def calc_b(df, each):
wv = df.loc[each, 'wv']
ba = df.loc[each, 'ba']
bw = df.loc[each, 'bw']
aa = df.loc[each, 'aa']
aw = df.loc[each, 'aw']
if wv == 0:
return ba
if wv>0 and (aw+bw)<wv:
return ba
if wv>0 and (aw+bw)>wv and (bw>wv):
return ba+bw-wv
if wv>0 and (aw+bw)>wv and (bw<wv):
return ba
def calc_a(df, each):
wv = df.loc[each, 'wv']
ba = df.loc[each, 'ba']
bw = df.loc[each, 'bw']
aa = df.loc[each, 'aa']
aw = df.loc[each, 'aw']
if wv == 0:
return aa
if wv>0 and (aw+bw)<wv:
return aa
if wv>0 and (aw+bw)>wv and (bw>wv):
return aa+aw
if wv>0 and (aw+bw)>wv and (bw<wv):
return aa+aw-abs(bw-wv)
#just a provisory quick df declaration
#df = pd.DataFrame(np.random.randint(0,100,size=(3, 5)),columns=['aa','aw','ba','bw', 'wv'])
for each in df.index.tolist():
df.loc[each, 'a_total'] = calc_a(df, each)
df.loc[each, 'b_total'] = calc_b(df, each)
print(df)
答案 1 :(得分:0)
使用np.select
。不惜一切代价避免循环
b_conditions = [df.wv == 0,
(df.wv>0) & ((df.aw+df.bw) < df.wv),
(df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw>df.wv),
(df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw<df.wv)]
b_choices = [df.ba, df.ba, df.ba + df.bw - df.wv, df.ba]
然后
df['b_total'] = np.select(condlist=b_conditions,
choicelist=b_choices)
类似地,
a_conditions = [df.wv == 0,
(df.wv>0) & (df.aw+df.bw) < df.wv,
(df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw>df.wv),
(df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw<df.wv)]
a_choices = [df.aa, df.aa, df.aa + df.aw, df.aa+df.aw-abs(df.bw-df.wv)]
然后
df['a_total'] = np.select(condlist=a_conditions,
choicelist=a_choices)