有条件地根据另一列的值更改数据框列

时间:2017-09-17 21:39:39

标签: python pandas dataframe

我有以下pandas数据帧d1

+----------+-------+---------+--------------+
| Item Num | Cost  | Revenue |  Rev / Cost  |
+----------+-------+---------+--------------+
|        1 | 45.76 |  345.67 | 7.5539772727 |
|        2 | 55.78 |  456.92 | 8.1914664754 |
|        3 | 34.68 |       0 |            0 |
|        4 | 79.85 |       0 |            0 |
+----------+-------+---------+--------------+

我想要的是Cost / Rev列的值等于该行的Cost,乘以负值1,如果'Cost / Rev'等于0

所以期望的输出是:

+----------+-------+---------+--------------+
| Item Num | Cost  | Revenue |  Rev / Cost  |
+----------+-------+---------+--------------+
|        1 | 45.76 |  345.67 | 7.5539772727 |
|        2 | 55.78 |  456.92 | 8.1914664754 |
|        3 | 34.68 |       0 |       -34.68 |
|        4 | 79.85 |       0 |       -79.85 |
+----------+-------+---------+--------------+

到目前为止我所拥有的是:

d1['Rev / Cost'] = d1['Rev / Cost'].apply(lambda x: x if x > 0 else d1['Cost'])

只使用单个值覆盖预期范围并抛出以下警告:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

2 个答案:

答案 0 :(得分:2)

创建一个遮罩,然后使用loc指定一个子标记。

mask = df['Rev / Cost'] == 0
df.loc[mask, 'Rev / Cost'] = df.loc[mask, 'Cost'].mul(-1)

答案 1 :(得分:0)

由于布尔值评估为0/1,您可以简单地将条件乘以成本并从Rev / Cost中减去它。这样可以提升性能。

df['Rev / Cost'] -=  df['Cost'] * (df['Rev / Cost'] == 0)

您也可以使用np.where

df['Rev / Cost'] = np.where(df['Rev / Cost'] == 0, -df['Cost'], df['Rev / Cost']

Series.where

df['Rev / Cost'] = df['Rev / Cost'].where(lambda x: x != 0, df.Cost)