获取每周的最后一个值,以及匹配日期

时间:2016-09-19 16:14:38

标签: python pandas

所以我开始使用名为pd.Series的{​​{1}},我希望将其分组为几周,并从每周获取最后一个值。这适用于下面的代码,它确实得到最后一个值。但它会将相应的指数更改为本周日的星期日,我希望保持不变

jpm

import pandas_datareader.data as web import pandas as pd start = pd.datetime(2015, 11, 1) end = pd.datetime(2015, 11, 17) raw_jpm = web.DataReader("JPM", 'yahoo', start, end)["Adj Close"] jpm = raw_jpm.ix[raw_jpm.index[::2]] 现在

jpm

我想对它做一些操作,例如

Date
2015-11-02    64.125610
2015-11-04    64.428918
2015-11-06    66.982593
2015-11-10    66.219427
2015-11-12    64.575682
2015-11-16    65.074678
Name: Adj Close, dtype: float64

weekly = jpm.groupby(pd.TimeGrouper('W')).last() 现在

weekly

这很棒,除了我的所有日​​期都改变了。我想要的输出是:

Date
2015-11-08    66.982593
2015-11-15    64.575682
2015-11-22    65.074678
Freq: W-SUN, Name: Adj Close, dtype: float64

3 个答案:

答案 0 :(得分:1)

您可以通过将DateOffset属性设置为4 [星期一:0→星期日],通过指定班级名称Week并指示每周频率W-FRI来提供dayofweek :6]

jpm.groupby(pd.TimeGrouper(freq=pd.offsets.Week(weekday=4))).last().tail(5)

Date
2016-08-19    65.860001
2016-08-26    66.220001
2016-09-02    67.489998
2016-09-09    66.650002
2016-09-16    65.820000
Freq: W-FRI, Name: Adj Close, dtype: float64

如果您希望开始日期为start日期的下一个星期一和end日期的上一个星期日,您可以这样做:

from datetime import datetime, timedelta

start = datetime(2015, 11, 1)
monday = start + timedelta(days=(7 - start.weekday())) 

end = datetime(2016, 9, 30)
sunday = end - timedelta(days=end.weekday() + 1)

print (monday)
2015-11-02 00:00:00
print (sunday)
2016-09-25 00:00:00

然后,将其用作:

jpm = web.DataReader('JPM', 'yahoo', monday, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='7D')).last()

为了在星期天得到这一切,因为你指定的范围是星期一→星期日和星期日是考虑日期的最后一天,你可以做一个小黑客:

monday_new = monday - timedelta(days=3)

jpm = web.DataReader('JPM', 'yahoo', monday_new, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='W')).last().head()

Date
2015-11-01    62.863448
2015-11-08    66.982593
2015-11-15    64.145175
2015-11-22    66.082449
2015-11-29    65.720431
Freq: W-SUN, Name: Adj Close, dtype: float64

现在您已发布了所需的输出,您可以使用transform方法获取结果,而不是采用聚合的last,以便它返回一个索引大小与被分组的那个。

df = jpm.groupby(pd.TimeGrouper(freq='W')).transform('last').reset_index(name='Last')

df

Image

df['counter'] = (df['Last'].shift() != df['Last']).astype(int).cumsum()

Image

df.groupby(['Last','counter'])['Date'].apply(lambda x: np.array(x)[-1])   \
  .reset_index().set_index('Date').sort_index()['Last']

Date
2015-11-06    66.982593
2015-11-12    64.575682
2015-11-16    65.074678
Name: Last, dtype: float64

注意:由于包含了将counter列分别分成两个桶的 If (EnteringZoneFromStart() = true) {OnTriggerEnter()} Else If (ExitingZone() = true) {OnTriggerExit()} Else {OnTriggerEnter()} 列,因此能够处理在两个单独日期中发生的重复条目。

答案 1 :(得分:1)

你可以这样做:

In [15]: jpm
Out[15]:
Date
2015-11-02    64.125610
2015-11-04    64.428918
2015-11-06    66.982593
2015-11-10    66.219427
2015-11-12    64.575682
2015-11-16    65.074678
Name: Adj Close, dtype: float64

In [16]: jpm.groupby(jpm.index.week).transform('last').drop_duplicates(keep='last')
Out[16]:
Date
2015-11-06    66.982593
2015-11-12    64.575682
2015-11-16    65.074678
dtype: float64

说明:

In [17]: jpm.groupby(jpm.index.week).transform('last')
Out[17]:
Date
2015-11-02    66.982593
2015-11-04    66.982593
2015-11-06    66.982593
2015-11-10    64.575682
2015-11-12    64.575682
2015-11-16    65.074678
dtype: float64

答案 2 :(得分:1)

在纯熊猫中这样做似乎有点棘手,所以我使用了numpy

import numpy as np
weekly = jpm.groupby(pd.TimeGrouper('W-SUN')).last()
weekly.index = jpm.index[np.searchsorted(jpm.index, weekly.index, side="right")-1]