Question

目前正在使用.apply（）函数进行Pandas操作。

fund_table[fund_table.fund_class == 'EQ']['fund_weight'].apply(lambda x: ((x*overall_wts[1])/100))

fund_table[fund_table.fund_class == 'DB']['fund_weight'].apply(lambda x: ((x*overall_wts[0])/100))

fund_table[fund_table.fund_class == 'LQ']['fund_weight'].apply(lambda x: ((x*overall_wts[2])/100))

每个代码都在修改某些行集合，现在如何更新主数据框，

我试过这样的事情：

fund_table['fund_weight'] = fund_table[fund_table.fund_class == 'EQ']['fund_weight'].apply(lambda x: ((x*overall_wts[1])/100))
fund_table['fund_weight'] = fund_table[fund_table.fund_class == 'DB']['fund_weight'].apply(lambda x: ((x*overall_wts[0])/100))
fund_table['fund_weight'] = fund_table[fund_table.fund_class == 'LQ']['fund_weight'].apply(lambda x: ((x*overall_wts[2])/100))

但是它失败了，列的所有值都是基金。正在改为 Nan

这样做的正确方法是什么？

Answer 1

当您分配到fund_weight时，您会覆盖之前保留的列，因此下一行正在处理错误的数据。

此外，当您基于fund_class进行过滤时，您将创建一个较小的数据帧。 fund_table[fund_table.fund_class == 'EQ']['fund_weight']小于fund_table，因此apply生成的系列较小。当您尝试将此系列分配给整个数据帧时，pandas会使用NaN填充缺少的值。

因此，您的第一行会将fund_weight的每一行转换为NaN，除了 fund_class等于“EQ＆＃39;”的行。您的下一行会过滤fund_class等于＆＃39; EQ＆＃39;的所有行，因此它只会看到NaN值，现在所有fund_weight都是NaN。

你想要更像的东西：

def calc_new_weight(row):
    if row['fund_class'] == 'EQ':
        overall_wt = overall_wts[1]
    elif row['fund_class'] == 'DB':
        overall_wt = overall_wts[0]
    elif row['fund_class'] == 'LQ':
        overall_wt = overall_wts[2]
    return row['fund_weight'] * overall_wt / 100
fund_table['fund_weight_calc'] = fund_table.apply(calc_new_weight, axis=1)

Answer 2

您可以使用.loc：

fund_table.loc[fund_table.fund_class == 'EQ', 'fund_weight'] = fund_table.loc[fund_table.fund_class == 'EQ', 'fund_weight'].apply(lambda x: ((x*overall_wts[1])/100))
# ...

但是，这可能会更好地重写为groupby：

wts = dict(zip(["DB", "EQ", "LQ"], overall_wts))
fund_table.groupby("fund_class").apply(lambda x: x * wts[x.name] / 100)

pandas在lambda操作后更新数据帧

2 个答案: