Question

我写了一些熊猫玩具代码，等同于这个玩具示例：

<app-validator [control]="form"></app-validator>

我得到以下结果：

df_test = pd.DataFrame({'product': [0, 0, 1, 1], 'sold_for': [5000, 4500, 10000, 8000]})

def product0_makes_profit(row, product0_cost):
    return row['sold_for'] > product0_cost

def product1_makes_profit(row, product1_cost):
    return row['sold_for'] > product1_cost

df_test['made_profit'] = df_test[df_test['product']==0].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand")
df_test['made_profit'] = df_test[df_test['product']==1].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand")
df_test

我希望第0行和第1行的'made_profit'列为True，而不是NaN，但显然第二个apply（）会覆盖第一个apply（）的made_profit列。

如何获得期望的列？我不想想要在第一个apply（）中创建一列“ product0_made_profit”，在第二个apply（）中创建一列“ product1_made_profit”，因此我可以将这两列合并为一个“ made_profit”我想获取的列，因为在我的真实代码中，产品列中有很多不同的值（意味着要应用许多不同的功能）。

编辑

我的玩具示例太简单了，实际上创建了两个新列：

    product sold_for    made_profit
0   0       5000        NaN
1   0       4500        NaN
2   1       10000       True
3   1       8000        False

使用当前答案，我做到了：

def product0_makes_profit(row, product0_cost):
    return [row['sold_for'] > product0_cost, row['sold_for'] - product0_cost]

def product1_makes_profit(row, product1_cost):
    return [row['sold_for'] > product1_cost, row['sold_for'] - product1_cost]

但这给了我以下错误（第一次使用.loc时）：

is_prod0 = (df_test['product']==0) df_test.loc[is_prod0, ['made_profit', 'profit_amount']] = df_test[is_prod0].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand") is_prod1 = (df_test['product']==1) df_test.loc[is_profd1, ['made_profit', 'profit_amount']] = df_test[is_prod1].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand") print(df_test)

我可以使用以下代码进行工作：

KeyError: "None of [Index(['made_profit', 'profit_amount'], dtype='object')] are in the [columns]"

但是，这涉及concat（）和join（），并且如上所述，它对实际代码有些繁琐（但可以通过对所有产品值构建循环来实现）-也许有一个优雅的解决方案列。

Answer 1

您需要分配与loc具有相同条件的已过滤行，因此，如果条件为True，则仅处理行：

m1 = df_test['product']==0
m2 = df_test['product']==1
df_test.loc[m1, 'made_profit'] = df_test[m1].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand")
df_test.loc[m2, 'made_profit'] = df_test[m2].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand")
print (df_test)
   product  sold_for  made_profit
0        0      5000         True
1        0      4500         True
2        1     10000         True
3        1      8000        False

编辑：

如果从function返回多个值需要返回Series，并用新的列名作为索引，则还需要在{{1}之前创建一个填充一些默认值（例如NaN）的新列}：

loc

两次使用apply（）创建新列导致覆盖新列

1 个答案: