在DataFrame中创建计算字段

时间:2019-01-29 18:07:20

标签: python pandas

我正在尝试在Pandas DataFrame中创建2个计算字段。结构如下:

Index    aa    aw    ba    bw    wv     a_total    b_total
1        0     0     141   0     0
2        0    45.12   0    0    90.50
3        0     0      0   2857   893

我正在尝试创建两个计算列(a_total和b_total),以计算每一行数据框的列。我需要由列的值和下面列出的if逻辑确定输出。

def calc_b():
if wv == 0:
    return ba

if wv>0 and (aw+bw)<wv:
    return ba

if wv>0 and (aw+bw)>wv and (bw>wv):
    return ba+bw-wv

if wv>0 and (aw+bw)>wv and (bw<wv):
    return ba

def calc_a():
if wv == 0:
    return aa

if wv>0 and (aw+bw)<wv:
    return aa

if wv>0 and (aw+bw)>wv and (bw>wv):
    return aa+aw

if wv>0 and (aw+bw)>wv and (bw<wv):
    return aa+aw-abs(bw-wv)     

在上面提供的示例数据中,输出列为:

Index    aa    aw    ba    bw    wv     a_total    b_total
1        0     0     141   0     0         0         141
2        0    45.12   0    0    90.50      0          0
3        0     0      0   2857   893       0         1964 

我也尝试过使用if / elif语句并以布尔值定义每个结果。我遇到的问题是,一旦确定了其中一行,就会将该计算应用于整个数据框。

只想看看我在这里可能会缺少什么。

谢谢!

2 个答案:

答案 0 :(得分:0)

您并不太容易理解该功能应该执行的操作,因此我假定了大部分功能并解决了我发现的问题。首先,要小心标识,这在Python中确实很重要。

第二个,wv,ba,bw,aa和aw变量没有在函数中声明(至少在您所展示的范围内),因此我在列中将每个变量都赋予了一个值,通过数据框索引进行迭代获得的结果,分别设置了最后两列中每个单元格的值。

如果我认为一切正确,那么这个小家伙就可以做到:

import pandas as pd
import numpy as np
def calc_b(df, each):
    wv = df.loc[each, 'wv']
    ba = df.loc[each, 'ba']
    bw = df.loc[each, 'bw']
    aa = df.loc[each, 'aa']
    aw = df.loc[each, 'aw']
    if wv == 0:
        return ba

    if wv>0 and (aw+bw)<wv:
        return ba

    if wv>0 and (aw+bw)>wv and (bw>wv):
        return ba+bw-wv

    if wv>0 and (aw+bw)>wv and (bw<wv):
        return ba

def calc_a(df, each):
    wv = df.loc[each, 'wv']
    ba = df.loc[each, 'ba']
    bw = df.loc[each, 'bw']
    aa = df.loc[each, 'aa']
    aw = df.loc[each, 'aw']
    if wv == 0:
        return aa

    if wv>0 and (aw+bw)<wv:
        return aa

    if wv>0 and (aw+bw)>wv and (bw>wv):
        return aa+aw

    if wv>0 and (aw+bw)>wv and (bw<wv):
        return aa+aw-abs(bw-wv)  

#just a provisory quick df declaration
#df = pd.DataFrame(np.random.randint(0,100,size=(3, 5)),columns=['aa','aw','ba','bw', 'wv'])

for each in df.index.tolist():
    df.loc[each, 'a_total'] = calc_a(df, each)
    df.loc[each, 'b_total'] = calc_b(df, each)

print(df)

答案 1 :(得分:0)

使用np.select。不惜一切代价避免循环

b_conditions = [df.wv == 0, 
               (df.wv>0) & ((df.aw+df.bw) < df.wv),
               (df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw>df.wv),
               (df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw<df.wv)]

b_choices = [df.ba, df.ba, df.ba + df.bw - df.wv, df.ba]

然后

df['b_total'] = np.select(condlist=b_conditions,
                          choicelist=b_choices)

类似地,

a_conditions = [df.wv == 0, 
               (df.wv>0) & (df.aw+df.bw) < df.wv,
               (df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw>df.wv),
               (df.wv>0) & ((df.aw+df.bw)>df.wv) & (df.bw<df.wv)]

a_choices = [df.aa, df.aa, df.aa + df.aw, df.aa+df.aw-abs(df.bw-df.wv)]

然后

df['a_total'] = np.select(condlist=a_conditions,
                          choicelist=a_choices)