我有以下pandas数据帧d1
:
+----------+-------+---------+--------------+
| Item Num | Cost | Revenue | Rev / Cost |
+----------+-------+---------+--------------+
| 1 | 45.76 | 345.67 | 7.5539772727 |
| 2 | 55.78 | 456.92 | 8.1914664754 |
| 3 | 34.68 | 0 | 0 |
| 4 | 79.85 | 0 | 0 |
+----------+-------+---------+--------------+
我想要的是Cost / Rev
列的值等于该行的Cost
,乘以负值1,如果'Cost / Rev'等于0
所以期望的输出是:
+----------+-------+---------+--------------+
| Item Num | Cost | Revenue | Rev / Cost |
+----------+-------+---------+--------------+
| 1 | 45.76 | 345.67 | 7.5539772727 |
| 2 | 55.78 | 456.92 | 8.1914664754 |
| 3 | 34.68 | 0 | -34.68 |
| 4 | 79.85 | 0 | -79.85 |
+----------+-------+---------+--------------+
到目前为止我所拥有的是:
d1['Rev / Cost'] = d1['Rev / Cost'].apply(lambda x: x if x > 0 else d1['Cost'])
只使用单个值覆盖预期范围并抛出以下警告:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
答案 0 :(得分:2)
创建一个遮罩,然后使用loc
指定一个子标记。
mask = df['Rev / Cost'] == 0
df.loc[mask, 'Rev / Cost'] = df.loc[mask, 'Cost'].mul(-1)
答案 1 :(得分:0)
由于布尔值评估为0/1,您可以简单地将条件乘以成本并从Rev / Cost中减去它。这样可以提升性能。
df['Rev / Cost'] -= df['Cost'] * (df['Rev / Cost'] == 0)
您也可以使用np.where
df['Rev / Cost'] = np.where(df['Rev / Cost'] == 0, -df['Cost'], df['Rev / Cost']
或Series.where
df['Rev / Cost'] = df['Rev / Cost'].where(lambda x: x != 0, df.Cost)