Question

我在 Python 中有两个数据框，一个包含有关汽车的信息，另一个包含有关燃料价格（汽油和柴油）的信息。数据帧的示例如下。

汽车

   regNo  regYear inspectionYear fuelType
0  AB1234 2008    2012           Gasoline
1  CD2345 2009    2011           Diesel
2  LD9876 2010    2013           Diesel

燃料价格

year fuelType price
2008 Gasoline 12.13
2009 Gasoline 19.52
2010 Gasoline 13.32
2011 Gasoline 13.54
2012 Gasoline 16.23
2013 Gasoline 11.34
2008 Diesel   9.43
2009 Diesel   9.37
2010 Diesel   9.89
2011 Diesel   10.04
2012 Diesel   8.42
2013 Diesel   9.21

我试图做的是在 cars 中添加一列，这是 regYear 和 inspectionYear 之间相关燃料类型的平均价格。所以我希望以这样的方式结束：

cars_newCol

   regNo  regYear inspectionYear fuelType fuelPrice
0  AB1234 2008    2012           Gasoline 14.95
1  CD2345 2009    2011           Diesel   9.77
2  LD9876 2010    2013           Diesel   9.39

也就是说，第一行是 fuelPrice 中 Gasoline 在 2008 年和 2012 年之间的燃料价格平均值。

我尝试了各种解决方案，但我觉得最接近某事的可能是：

cars['fuelPrice'] = fuel_prices.loc[(fuel_prices['year']>=cars['regYear']) & 
                                    (fuel_prices['year']<=cars['inspectionYear']) &
                                    (fuel_prices['fuelType']==cars['fuelType']),
                                    'price'].mean()

然而，输出并不如预期。数据框非常大（约 7 mio。行），因此我不喜欢在 for 循环中执行它，除非有人认为这可能是有效的。

提前致谢 - 非常感谢。

Answer 1

您想要 merge，然后过滤行和 groupby：

(cars.merge(fuelPrice, on='fuelType')
     .query('regYear<= year <= inspectionYear')
     .groupby(cars.columns.to_list(), as_index=False)['price'].mean()
)

输出：

    regNo  regYear  inspectionYear  fuelType      price
0  AB1234     2008            2012  Gasoline  14.948000
1  CD2345     2009            2011    Diesel   9.766667
2  LD9876     2010            2013    Diesel   9.390000

如何将一列添加到我的数据框中，其中包含来自另一个数据框的年份之间的均值？

1 个答案: