熊猫数据框中的计算列

时间:2019-09-24 13:51:55

标签: pandas date aggregate

我有以下数据框

dataframe illustration

数据框代码:

df = pd.DataFrame({'Car Type': ['Compact']*9 + ['Economy'],
                  'Supplier':['Alamo','Enterprise','Budget','Nation', 'Avis','Payless','Payless','Payless','E-ZRent-a-Car','E-ZRent-a-Car'],
                  'Total Price':[74]*3+[78,79,84,35,37,43,43],
                  'Location':['Altanta']*10,
                  'Pick-up Date':['Jun/12/2019']*6+['Jun/13/2019']*4,
                  'Date Accessed':['06-11-2019']*10})

我需要创建一个数据框,其中包含“供应商”,“汽车类型”,“取车日期”,“访问日期”的唯一组合列表,以及竞争性报价和最佳竞争对手的数量价格可在“提货日期”前1-14天获得。

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:0)

要获取您描述的列的唯一组合:

df.drop_duplicates(subset=["Supplier", "Car Type", "Pick-up Date", "Date Accessed"])

答案 1 :(得分:0)

IIUC,

您需要先过滤数据框,然后再按总价排序并删除重复项。

df[(df['Pick-up Date'] - df['Date Accessed']) < pd.Timedelta(days=14)]\
 .sort_values('Total Price', ascending=False).drop_duplicates(['Car Type', 'Supplier', 
                                                               'Pick-up Date', 'Date Accessed'])

输出:

  Car Type       Supplier  Total Price Location Pick-up Date Date Accessed
5  Compact        Payless           84  Altanta   2019-06-12    2019-06-11
4  Compact           Avis           79  Altanta   2019-06-12    2019-06-11
3  Compact         Nation           78  Altanta   2019-06-12    2019-06-11
0  Compact          Alamo           74  Altanta   2019-06-12    2019-06-11
1  Compact     Enterprise           74  Altanta   2019-06-12    2019-06-11
2  Compact         Budget           74  Altanta   2019-06-12    2019-06-11
8  Compact  E-ZRent-a-Car           43  Altanta   2019-06-13    2019-06-11
9  Economy  E-ZRent-a-Car           43  Altanta   2019-06-13    2019-06-11
7  Compact        Payless           37  Altanta   2019-06-13    2019-06-11