Python-每周,每月,每季度,六个月,一年等获取第一个和最后一个观察值

时间:2019-10-11 06:39:45

标签: python pandas dataframe

我有一个数据集(Pandas数据框),如下所示:

         Date      Open      High  ...       date  year  month
0  2002-05-23  1.156429  1.242857  ... 2002-05-23  2002      5
1  2002-05-24  1.214286  1.225000  ... 2002-05-24  2002      5
2  2002-05-28  1.213571  1.232143  ... 2002-05-28  2002      5
3  2002-05-29  1.164286  1.164286  ... 2002-05-29  2002      5
4  2002-05-30  1.107857  1.107857  ... 2002-05-30  2002      5

我如何获得每周,每月,每季度,6个月,每年等的第一个和最后一个观测值?我想从数据集中获取价格,然后计算各个时期的回报。

编辑:

您的代码为我提供了df1的以下输出:

Index        first    last
2002-05-26   1.15643  1.21429
2002-06-02   1.21357  1.07857

我想要:

First date       Last date     First value    Last value
2002-05-26       2005-06-01    1.15643        1.21429
2002-06-02       2002-06-09    1.21357        1.07857

1 个答案:

答案 0 :(得分:1)

我相信您需要DataFrame.resamplefirstlast的总和:

df1 = df.resample('W', on='Date')['Open'].agg(['first','last'])
df2 = df.resample('M', on='Date')['Open'].agg(['first','last'])
df3 = df.resample('Q', on='Date')['Open'].agg(['first','last'])
df4 = df.resample('6M', on='Date')['Open'].agg(['first','last'])
df5 = df.resample('Y', on='Date')['Open'].agg(['first','last'])

编辑:

print (df)
         Date  Open
0  2002-05-23   1.1
1  2002-05-24   1.2
2  2002-05-28   1.3
3  2002-05-29   1.4
4  2002-05-30   1.5
5  2002-05-31   1.6
6  2002-06-01   1.7
7  2002-06-02   1.8
8  2002-06-03   1.9
9  2002-06-04   2.0

df['Date'] = pd.to_datetime(df['Date'])
df1 = df.resample('W', on='Date')['Open'].agg(['first','last'])
print (df1)
            first  last
Date                   
2002-05-26    1.1   1.2
2002-06-02    1.3   1.8
2002-06-09    1.9   2.0

df1 = df.set_index('Date').resample('W')['Open'].agg([('First date', lambda x: x.index[0]),
                                                      ('Last date', lambda x: x.index[-1]),
                                                      ('First value','first'),
                                                      ('Last value','last')])
print (df1)
           First date  Last date  First value  Last value
Date                                                     
2002-05-26 2002-05-23 2002-05-24          1.1         1.2
2002-06-02 2002-05-28 2002-06-02          1.3         1.8
2002-06-09 2002-06-03 2002-06-04          1.9         2.0