Question

我正在尝试对Pandas对象中的组进行重新采样。重新采样有效，但是不知何故该对象未修改...是否需要创建新组或其他内容？

这是我的代码：

grouped_by_product_comp = competitor_df.sort_values(['history_date']).groupby(['item_id'])
for name, group in grouped_by_product_comp:
    my_prod = name
    group = group.drop_duplicates(subset = 'history_date')
    group.set_index('history_date', inplace = True)
    group = group.asfreq('D',method='pad')
    print(group.head())
    break

my_group = grouped_by_product_comp.get_group(394846296)
print(my_group.head())

这是我的输出：

              id    item_id  competitor_id  competitor_price
history_date                                                  
2016-01-25    3504  394846296        2301745              1205
2016-01-26    3504  394846296        2301745              1205
2016-01-27    3504  394846296        2301745              1205
2016-01-28    3504  394846296        2301745              1205
2016-01-29    3504  394846296        2301745              1205

           id history_date    item_id  competitor_id  competitor_price
187116   3504   2016-01-25  394846296        2301745              1205
188119  17460   2016-02-23  394846296        2301745              1205
188945  28392   2016-03-17  394846296        2301745              1205
189063  29988   2016-03-20  394846296        2301745              1205
189477  35004   2016-03-31  394846296        2301745              1205

因此对象没有在for循环之外更改...我是否应该以某种方式告诉Groupby对象而不是组？非常感谢您在阅读本文！

Answer 1

您可以使用apply而不是循环for并将值分配给新的数据帧（或相同的数据帧）：

new_competitor_df = (competitor_df.sort_values(['history_date']).groupby(['item_id'])
                                  .apply(lambda df_g: (df_g.drop_duplicates(subset = 'history_date')
                                                           .set_index('history_date')
                                                           .asfreq('D',method='pad')))
                                  .reset_index(0,drop=True))

然后您可以通过执行以下操作获取所需的所有数据：

print (new_competitor_df[new_competitor_df['item_id'] ==394846296].head())
                id    item_id  competitor_id  competitor_price
history_date                                                  
2016-01-25    3504  394846296        2301745              1205
2016-01-26    3504  394846296        2301745              1205
2016-01-27    3504  394846296        2301745              1205
2016-01-28    3504  394846296        2301745              1205
2016-01-29    3504  394846296        2301745              1205

或与print (new_competitor_df.groupby(['item_id']).get_group(394846296).head())

相同的结果

无法更改熊猫Groupby对象

1 个答案: