Question

根据我的数据，我已经能够成功地计算出一周到一周的变化。但是，我的数据包括需要分类的数千个组。因此，我正在寻找一种比当前实施方式更快速/更有效的方式来逐周计算这些变化。

它当前的运行方式是我有一个for循环，该循环对每个子集/ store_ID进行每周一次的更改。计算效果很好，但是要完成10,000多个不同项目，需要花费相当长的时间。是否可以通过对“ store_ID”列进行分组来做到这一点？我一直在玩.groupby ...，但是由于它是一个groupby对象，所以不太确定如何使用它。

这是我的代码及其工作方式：

我有一个名为df的数据框，其中包含所有信息。它已经被清理和排序，因此每个store_ID都按周升序排列。为了简单起见，我们只说这些列：

df[['store_ID', 'Week', 'Sales']]

所以......

# Create list of each store
list_of_stores = list(df['store_ID'].unique())

# Create dataframe to dump the results into
results_df = pd.DataFrame()

# Iterate store-by-store to calculate the week to week values
for store in list_of_stores:

    # Create a temporary dataframe to do the calculation for the store_ID
    temp_df = pd.DataFrame()
    temp_df = df[df['store_ID'] == store]
    index_list = list(temp_df.index)
    temp_df.index = temp_df['Week']
    temp_df['Sales_change_1_week']= temp_df['Sales'] - 
    temp_df['Sales'].shift(1, freq=Week())
    temp_df.index = index_list

    # Dump the temporary dataframe into a results dataframe
    results_df = results_df.append(temp_df)

因此，最后，我获得了每周所有store_ID的完整结果。我确实要注意，有一些星期丢失了，所以在那种情况下，我确实有几个星期无法计算上周变化的空值，我对此表示满意。

所以我取每个store_ID：

创建一个临时数据帧，并按“周”进行排序。
我存储原始索引
然后按周重新编制索引（以便可以按周进行轮班）。
每周计算销售额变化并放入新列中
重新索引回原始索引
将其附加到结果数据框
使用下一个store_ID重复

我觉得有一种方法可以一次完成所有操作，而不是分别处理每个store_ID，但似乎找不到方法。

Answer 1

这是我用于类似操作的代码：

week_freq = 'W-TUE'
temp_df['Sales_change_1_week] = temp_df['Sales'].asfreq(week_freq).diff()

计算熊猫的每周变化（使用groupby）？

1 个答案: