有条件的Groupby

时间:2017-12-03 23:13:56

标签: python pandas numpy

我正在尝试创建一个新变量,该变量在以下数据帧上执行年月之间的SALES_AMOUNT差异。我认为我的代码应该考虑使用这个groupby但我不知道如何在groupby之后添加条件[df2 df.Control - df.Control.shift(1) == 12]以便在年份之间执行正确的差异

df['LY'] = df.groupby(['month']).SALES_AMOUNT.shift(1)

数据帧:

    SALES_AMOUNT Store  Control  year  month
0       16793.14     A        3  2013      3
1       42901.61     A        5  2013      5
2       63059.72     A        6  2013      6
3      168471.43     A       10  2013     10
4       58570.72     A       11  2013     11
5       67526.71     A       12  2013     12
6       50649.07     A       14  2014      2
7       48819.97     A       18  2014      6
8       97100.77     A       19  2014      7
9       67778.40     A       21  2014      9
10      90327.52     A       22  2014     10
11      75703.12     A       23  2014     11
12      26098.50     A       24  2014     12
13      81429.36     A       25  2015      1
14      19539.85     A       26  2015      2
15      71727.66     A       27  2015      3
16      20117.79     A       28  2015      4
17      44252.19     A       29  2015      6
18      68578.82     A       30  2015      7
19      91483.39     A       31  2015      8
20      39220.87     A       32  2015     10
21      12224.11     A       33  2015     11

结果应如下所示:

    SALES_AMOUNT Store  Control  year  month  year_diff
0       16793.14     A        3  2013      3        Nan
1       42901.61     A        5  2013      5        Nan
2       63059.72     A        6  2013      6        Nan
3      168471.43     A       10  2013     10        Nan      
4       58570.72     A       11  2013     11        Nan
5       67526.71     A       12  2013     12        Nan
6       50649.07     A       14  2014      2        Nan
7       48819.97     A       18  2014      6  -14239.75    
8       97100.77     A       19  2014      7        Nan
9       67778.40     A       21  2014      9        Nan
10      90327.52     A       22  2014     10  -78143.91
11      75703.12     A       23  2014     11    17132.4
12      26098.50     A       24  2014     12  -41428.21
13      81429.36     A       25  2015      1        Nan
14      19539.85     A       26  2015      2  -31109.22   
15      71727.66     A       27  2015      3        Nan
16      20117.79     A       28  2015      4        Nan
17      44252.19     A       29  2015      6   -4567.78   
18      68578.82     A       30  2015      7  -28521.95
19      91483.39     A       31  2015      8        Nan
20      39220.87     A       32  2015     10  -51106.65 
21      12224.11     A       33  2015     11  -63479.01

2 个答案:

答案 0 :(得分:1)

我认为您正在寻找的是以下内容:

df = df.sort_values(by=['month', 'year'])
df['SALES_AMOUNT_shifted'] = df.groupby(['month'])['SALES_AMOUNT'].shift(1).tolist()
df['LY'] = df['SALES_AMOUNT'] - df['SALES_AMOUNT_shifted']

按月份和年份排序后,月份组将以一致的方式组织,然后转换才有意义。

- 更新 -

应用上述解决方案后,您可以将年份差异大于1的所有实例设置为None

df['year_diff'] = df['year'] - df.groupby(['month'])['year'].shift()
df['year_diff'] = df['year_diff'].fillna(0)
df.loc[df['year_diff'] != 1, 'LY'] = None

使用此功能,我将获得您添加的所需输出。

答案 1 :(得分:0)

这有用吗?我也非常感谢以熊猫为中心的解决方案,因为我花了一些时间在这上面并且无法想出一个。

df = pd.read_clipboard().set_index('Control')
df['yoy_diff'] = np.nan
for i in df.index:
    for j in df.index:
        if j - i == 12:
            df['yoy_diff'].loc[j] = df.loc[j, 'SALES_AMOUNT'] - df.loc[i, 'SALES_AMOUNT']
df

输出:

         SALES_AMOUNT Store  year  month  yoy_diff
Control                                           
3            16793.14     A  2013      3       NaN
5            42901.61     A  2013      5       NaN
6            63059.72     A  2013      6       NaN
10          168471.43     A  2013     10       NaN
11           58570.72     A  2013     11       NaN
12           67526.71     A  2013     12       NaN
14           50649.07     A  2014      2       NaN
18           48819.97     A  2014      6 -14239.75
19           97100.77     A  2014      7       NaN
21           67778.40     A  2014      9       NaN
22           90327.52     A  2014     10 -78143.91
23           75703.12     A  2014     11  17132.40
24           26098.50     A  2014     12 -41428.21
25           81429.36     A  2015      1       NaN
26           19539.85     A  2015      2 -31109.22
27           71727.66     A  2015      3       NaN
28           20117.79     A  2015      4       NaN
29           44252.19     A  2015      6       NaN
30           68578.82     A  2015      7  19758.85
31           91483.39     A  2015      8  -5617.38
32           39220.87     A  2015     10       NaN
33           12224.11     A  2015     11 -55554.29