Question

我有一个数据框，如下所示。我想通过两个条件（列“日期”和列“价格”）对这个数据框进行分组，并添加一列以计算“ price2”（具有相同的日期和价格）的平均值。谢谢。

import pandas as pd
import numpy as np
df2 = pd.DataFrame({
    'date': [20130101,20130101, 20130105, 20130105, 20130101, 20130108],
    'price': [25, 25, 23.5, 27, 40, 8],
     'price2':[23,56,45,67,33,2]
})
       date  output  price  price2
0  20130101    39.5   25.0      23
1  20130101    39.5   25.0      56
2  20130105    45.0   23.5      45
3  20130105    67.0   27.0      67
4  20130101    33.0   40.0      33
5  20130108     2.0    8.0       2

Answer 1

尝试使用__wrapped__：

pandas.DataFrame.groupby.transform

现在：

newdf=df2.groupby(['date','price']).transform('mean')

是：

print(newdf)

现在要获取更多列，请执行以下操作：

那么现在：

newdf.columns=['output']
newdf=pd.concat([newdf,df2],axis=1)

是：

print(newdf)

然后还添加：

   output      date  price  price2
0    39.5  20130101   25.0      23
1    39.5  20130101   25.0      56
2    45.0  20130105   23.5      45
3    67.0  20130105   27.0      67
4    33.0  20130101   40.0      33
5     2.0  20130108    8.0       2

如果需要正确的列索引

那么现在：

df2=df2.insert(1, 'output', newdf)

是：

print(newdf)

Answer 2

您可以使用熊猫的groupby功能：

grp = df2.groupby(['date', 'price']).agg('price2':'mean').rename(columns={'price2':'output'})
pd.merge(df2, grp, on=['date', 'price'])

python groupby有两个条件并计算平均值

2 个答案: