我有一个名为pricecomp_df的数据框,我想比较列的价格"市场价格"而其他每一栏都像"苹果价格","芒果价格","西瓜价格"但根据条件优先考虑差异:(首先是西瓜价格,第二优先选择芒果,第三优先选择苹果)。输入数据帧如下:
code apple price mangoes price watermelon price market price
0 101 101 NaN NaN 122
1 102 123 123 NaN 124
2 103 NaN NaN NaN 123
3 105 123 167 NaN 154
4 107 165 NaN 177 176
5 110 123 NaN NaN 123
所以这里第一排只有苹果价格和市场价格然后采取他们的差异,但在第二排,我们有苹果,芒果价格所以我只需要采取市场价格和芒果价格之间的差异。同样根据优先条件采取差异。对于所有三种价格,也跳过带有nan的行。任何人都可以帮忙吗?
答案 0 :(得分:19)
希望我不会太迟。我们的想法是计算差异并根据您的优先级列表覆盖它们。
import numpy as np
import pandas as pd
df = pd.DataFrame({'code': [101, 102, 103, 105, 107, 110],
'apple price': [101, 123, np.nan, 123, 165, 123],
'mangoes price': [np.nan, 123, np.nan, 167, np.nan, np.nan],
'watermelon price': [np.nan, np.nan, np.nan, np.nan, 177, np.nan],
'market price': [122, 124, 123, 154, 176, 123]})
# Calculate difference to apple price
df['diff'] = df['market price'] - df['apple price']
# Overwrite with difference to mangoes price
df['diff'] = df.apply(lambda x: x['market price'] - x['mangoes price'] if not np.isnan(x['mangoes price']) else x['diff'], axis=1)
# Overwrite with difference to watermelon price
df['diff'] = df.apply(lambda x: x['market price'] - x['watermelon price'] if not np.isnan(x['watermelon price']) else x['diff'], axis=1)
print df
apple price code mangoes price market price watermelon price diff
0 101 101 NaN 122 NaN 21
1 123 102 123 124 NaN 1
2 NaN 103 NaN 123 NaN NaN
3 123 105 167 154 NaN -13
4 165 107 NaN 176 177 -1
5 123 110 NaN 123 NaN 0