熊猫数据框在列中获得趋势

时间:2020-06-29 03:31:15

标签: pandas dataframe machine-learning kaggle

我有一个数据框:

np.random.seed(1)
df1 = pd.DataFrame({'day':[3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6],
                   'item': [1, 1, 2, 2, 1, 2, 3, 3, 4, 3, 4],
                   'price':np.random.randint(1,30,11)})
   day item price
0   3   1   6
1   4   1   12
2   4   2   13
3   4   2   9
4   5   1   10
5   5   2   12
6   5   3   6
7   5   3   16
8   5   4   1
9   6   3   17
10  6   4   2

在分组代码gb = df1.groupby(['day','item'])['price'].mean()之后,我得到:

gb

     day  item
3    1        6
4    1       12
     2       11
5    1       10
     2       12
     3       11
     4        1
6    3       17
     4        2
Name: price, dtype: int64

我想从groupby系列获得趋势,并替换回dataframe列价格。价格是商品价格相对于前一天价格的变化

  day item  price
0   3   1   nan
1   4   1   6
2   4   2   nan
3   4   2   nan
4   5   1   -2
5   5   2   1
6   5   3   nan
7   5   3   nan
8   5   4   nan
9   6   3   6
10  6   4   1

请帮助我编写最后一步。单行/双行代码将最有帮助。由于实际的数据帧很大,因此我想避免迭代。

1 个答案:

答案 0 :(得分:1)

希望这会有所帮助!

    #get the average values
    mean_df=df1.groupby(['day','item'])['price'].mean().reset_index()
    #rename columns 
    mean_df.columns=['day','item','average_price']
    #sort by day an item in ascending
    mean_df=mean_df.sort_values(by=['day','item'])
    #shift the price for each item and each day 
    mean_df['shifted_average_price'] = mean_df.groupby(['item'])['average_price'].shift(1)
    #combine with original df 
    df1=pd.merge(df1,mean_df,on=['day','item'])
    #replace the price by difference of previous day's 
    df1['price']=df1['price']-df1['shifted_average_price']
    #drop unwanted columns
    df1.drop(['average_price', 'shifted_average_price'], axis=1, inplace=True)
相关问题