Question

假设您有一个数据框，如下所示：

data = pd.DataFrame({'Year': [2019]*5+[2020]*5,
          'Month': [1,1,2,2,3]*2,
          'Hour': [0,1,2,3,4]*2,
          'Value': [0.2,0.3,0.2,0.1,0.4,0.3,0.2,0.5,0.1,0.2]})

然后，将“低”时间设置为1到3（含）之间的小时，将“高”时间设置为所有其他小时（在这种情况下为0和4小时）。我想做的是获取Value和Year的“低”和“高”时间的平均Month。理想情况下，这些应作为新列追加到groupby（）数据帧（即，最终数据帧将具有Year，Month，Low和High列）。

对于循环有效，但并不理想。我还可以创建一个虚拟变量（例如0和1），以表示要分组的数据帧中的“低”和“高”时间。但是，在我看来，应该有某种方法可以使用Pandas groupby（['Year'，'Month']）。agg（...）以高效/最佳的方式获得结果。到目前为止，我没有使用groupby + agg的运气，主要是因为agg（）仅使用一个序列（而不是剩余的数据帧），因此无法基于Hour使用agg中的条件来计算平均Value。

样本数据的预期结果：

Year Month High Low 0 2019 1 0.2 0.30 1 2019 2 NaN 0.15 2 2019 3 0.4 NaN 3 2020 1 0.3 0.20 4 2020 2 NaN 0.30 5 2020 3 0.2 NaN

我们将不胜感激：）

Answer 1

在创建低/高类型指示符字段后考虑pivot_table：

data['Type'] = np.where(data['Hour'].between(1,3), 'Low', 'High')

pvt_df = (pd.pivot_table(data, index=['Year', 'Month'], 
                         columns='Type', values='Value', aggfunc=np.mean)
            .reset_index()
            .rename_axis(None, axis='columns')
         )    

print(pvt_df)
#    Year  Month  High   Low
# 0  2019      1   0.2  0.30
# 1  2019      2   NaN  0.15
# 2  2019      3   0.4   NaN
# 3  2020      1   0.3  0.20
# 4  2020      2   NaN  0.30
# 5  2020      3   0.2   NaN

Answer 2

可能无法赢得最精美代码的价格，但是如果我理解正确，这就是您想要的。

（如果我错了，请纠正我，因为其中没有预期的输出）

Groupby 4次并将年份和月份连在一起。之后，进行最终合并以将所有列汇总在一起

low_hours = [1, 2, 3]

groupby1 = data[data.Hour.isin(low_hours)].groupby('Year').Value.mean().reset_index().rename({'Value':'Value_year_low'},axis=1)
groupby2 = data[~data.Hour.isin(low_hours)].groupby('Year').Value.mean().reset_index().rename({'Value':'Value_year_high'},axis=1).drop('Year', axis=1)
groupby3 = data[data.Hour.isin(low_hours)].groupby(['Year','Month']).Value.mean().reset_index().rename({'Value':'Value_month_low'},axis=1)
groupby4 = data[~data.Hour.isin(low_hours)].groupby(['Year','Month']).Value.mean().reset_index().rename({'Value':'Value_month_high'},axis=1).drop(['Year','Month'], axis=1)

df_final1 = pd.concat([groupby1, groupby2], axis=1)
df_final2 = pd.concat([groupby3, groupby4], axis=1)

df_final = pd.merge(df_final1, df_final2, on='Year')
print(df_final)
   Year  Value_year_low  Value_year_high  Month  Value_month_low  \
0  2019        0.200000             0.30      1             0.30   
1  2019        0.200000             0.30      2             0.15   
2  2020        0.266667             0.25      1             0.20   
3  2020        0.266667             0.25      2             0.30   

   Value_month_high  
0               0.2  
1               0.4  
2               0.3  
3               0.2

Answer 3

data = pd.DataFrame({'Year': [2019]*5+[2020]*5,
          'Month': [1,1,2,2,3]*2,
          'Hour': [0,1,2,3,4]*2,
          'Value': [0.2,0.3,0.2,0.1,0.4,0.3,0.2,0.5,0.1,0.2]})

data['low'] = (data['Hour'] > 0) & (data['Hour'] < 4)

data[data['low']][['Month', 'Year']].mean()
data[~data['low']][['Month', 'Year']].mean()

Pandas Groupby有条件聚合

3 个答案: