使用pandas计算月度和年度变化

时间:2017-05-22 13:23:43

标签: python pandas

我无法理解如何做到这一点,但我想从这个DataFrame中走出来:

Date    Value
Jan-15  300
Feb-15  302
Mar-15  303
Apr-15  305
May-15  307
Jun-15  307
Jul-15  305
Aug-15  306
Sep-15  308
Oct-15  310
Nov-15  309
Dec-15  312
Jan-16  315
Feb-16  317
Mar-16  315
Apr-16  315
May-16  312
Jun-16  314
Jul-16  312
Aug-16  313
Sep-16  316
Oct-16  316
Nov-16  316
Dec-16  312

通过计算月度和年度变化来计算:

Date    Value  otm  oty
Jan-15  300    na   na
Feb-15  302    2    na
Mar-15  303    1    na
Apr-15  305    2    na
May-15  307    2    na
Jun-15  307    0    na
Jul-15  305    -2   na
Aug-15  306    1    na
Sep-15  308    2    na
Oct-15  310    2    na
Nov-15  309    -1   na
Dec-15  312    3    na
Jan-16  315    3    15
Feb-16  317    2    15
Mar-16  315    -2   12
Apr-16  315    0    10
May-16  312    -3   5
Jun-16  314    2    7
Jul-16  312    -2   7
Aug-16  313    1    7
Sep-16  316    3    8
Oct-16  316    0    6
Nov-16  316    0    7
Dec-16  312    -4   0

所以otm是从上面的字段的值计算的,而oty是从上面的12个字段计算的。

3 个答案:

答案 0 :(得分:4)

我认为你需要diff,但是必须在索引中没有错过任何月份:

df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)
print (df)
      Date  Value  otm   oty
0   Jan-15    300  NaN   NaN
1   Feb-15    302  2.0   NaN
2   Mar-15    303  1.0   NaN
3   Apr-15    305  2.0   NaN
4   May-15    307  2.0   NaN
5   Jun-15    307  0.0   NaN
6   Jul-15    305 -2.0   NaN
7   Aug-15    306  1.0   NaN
8   Sep-15    308  2.0   NaN
9   Oct-15    310  2.0   NaN
10  Nov-15    309 -1.0   NaN
11  Dec-15    312  3.0   NaN
12  Jan-16    315  3.0  15.0
13  Feb-16    317  2.0  15.0
14  Mar-16    315 -2.0  12.0
15  Apr-16    315  0.0  10.0
16  May-16    312 -3.0   5.0
17  Jun-16    314  2.0   7.0
18  Jul-16    312 -2.0   7.0
19  Aug-16    313  1.0   7.0
20  Sep-16    316  3.0   8.0
21  Oct-16    316  0.0   6.0
22  Nov-16    316  0.0   7.0
23  Dec-16    312 -4.0   0.0

如果缺少某些数据,则有点复杂:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y').dt.to_period('M')
df = df.set_index('Date')
df = df.reindex(pd.period_range(df.index.min(), df.index.max(), freq='M'))
df.index = df.index.strftime('%b-%y')
df = df.rename_axis('date').reset_index()

df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)

print (df)
      date  Value  otm   oty
0   Jan-15  300.0  NaN   NaN
1   Feb-15  302.0  2.0   NaN
2   Mar-15    NaN  NaN   NaN
3   Apr-15    NaN  NaN   NaN
4   May-15  307.0  NaN   NaN
5   Jun-15  307.0  0.0   NaN
6   Jul-15  305.0 -2.0   NaN
7   Aug-15  306.0  1.0   NaN
8   Sep-15  308.0  2.0   NaN
9   Oct-15  310.0  2.0   NaN
10  Nov-15  309.0 -1.0   NaN
11  Dec-15  312.0  3.0   NaN
12  Jan-16  315.0  3.0  15.0
13  Feb-16  317.0  2.0  15.0
14  Mar-16  315.0 -2.0   NaN
15  Apr-16  315.0  0.0   NaN
16  May-16  312.0 -3.0   5.0
17  Jun-16  314.0  2.0   7.0
18  Jul-16  312.0 -2.0   7.0
19  Aug-16  313.0  1.0   7.0
20  Sep-16  316.0  3.0   8.0
21  Oct-16  316.0  0.0   6.0
22  Nov-16  316.0  0.0   7.0
23  Dec-16  312.0 -4.0   0.0

答案 1 :(得分:0)

df['otm'] = df['Value'] - df['Value'].shift(1)
df['oty'] = df['Value'] - df['Value'].shift(12)

答案 2 :(得分:0)

更正确的解决方案是按月换班:

#Create datetime column
df['DateTime'] = pd.to_datetime(df['Date'], format='%b-%y')

#Set it as index
df.set_index('DateTime', inplace=True)

#Then shift by month frequency:
df['otm'] = df['Value'] - df['Value'].shift(1, freq='MS')
df['oty'] = df['Value'] - df['Value'].shift(12, freq='MS')