方法1

Question

假设下面的数据框是python，如何检查每个名称的 pre_hit_price 和 post_hit_price （平均值或总和）？

（在SAS中，我们可以首先使用，最后使用。）

d = {'Name': ['A','A','A','A','B','B','B','B'], 'price' : [0,1,2,3,2,4,6,8] , 'month': [201901, 201902, 201903, 201904, 201901, 201902, 201903, 201904] , 'hit': [0,1,0,0,0,1,1,0]}
df = pd.DataFrame(data=d)
df

example_df：

名字重复了几个月
匹配列是区分符
pre_hit_price 对于A应该为0，对于B应该为2
post_hit_price 对于A应该为（1 + 2 + 3），对于B应该为（4 + 6 + 8）。（包括命中月份）

output_df：

Answer 1

方法1

使用GroupBy.sum + DataFrame.unstack。 方法2 中说明了将布尔序列归为（groupby_hit）的方法。您只需要添加Series.map

的用法

groupby_hit=df.groupby('Name').hit.cumsum().eq(0).map({False:'post_hit_price',True:'pre_hit_price'})
new_df=df.groupby(['Name',groupby_hit],sort=False).price.sum().unstack().rename_axis(columns=None)
print(new_df)

      pre_hit_price  post_hit_price
Name                               
A                 0               6
B                 2              18

方法2

创建两个DataFrame 根据{{1}}在1列中的外观和hit列的值，使用DataFrame.groupby.cumsum和Series.eq来执行boolean indexing（请参阅详细信息）。然后使用Groupby.agg + pd.concat：

Name

prehit_mask=df.groupby('Name').hit.cumsum().eq(0)
df_pre=df[prehit_mask]
df_post=df[~prehit_mask]
new_df=pd.concat([df_pre.groupby('Name').price.agg(pre_hit_price='sum'),
                  df_post.groupby('Name').price.agg(post_hit_price='sum')],
                  axis=1)
print(new_df)

如果您使用熊猫<0.25.0 ：

      pre_hit_price  post_hit_price
Name                               
A                 0               6
B                 2              18

详细信息：

new_df=pd.concat([df_pre.groupby('Name').price.agg({'pre_hit_price':'sum'}),
                  df_post.groupby('Name').price.agg({'post_hit_price':'sum'})],
                  axis=1)

根据ID和月份（取决于其他列）汇总数据

1 个答案:

方法1

方法2