所以我开始使用名为pd.Series
的{{1}},我希望将其分组为几周,并从每周获取最后一个值。这适用于下面的代码,它确实得到最后一个值。但它会将相应的指数更改为本周日的星期日,我希望保持不变。
jpm
import pandas_datareader.data as web
import pandas as pd
start = pd.datetime(2015, 11, 1)
end = pd.datetime(2015, 11, 17)
raw_jpm = web.DataReader("JPM", 'yahoo', start, end)["Adj Close"]
jpm = raw_jpm.ix[raw_jpm.index[::2]]
现在
jpm
我想对它做一些操作,例如
Date
2015-11-02 64.125610
2015-11-04 64.428918
2015-11-06 66.982593
2015-11-10 66.219427
2015-11-12 64.575682
2015-11-16 65.074678
Name: Adj Close, dtype: float64
weekly = jpm.groupby(pd.TimeGrouper('W')).last()
现在
weekly
这很棒,除了我的所有日期都改变了。我想要的输出是:
Date
2015-11-08 66.982593
2015-11-15 64.575682
2015-11-22 65.074678
Freq: W-SUN, Name: Adj Close, dtype: float64
答案 0 :(得分:1)
您可以通过将DateOffset
属性设置为4 [星期一:0→星期日],通过指定班级名称Week
并指示每周频率W-FRI
来提供dayofweek
:6]
jpm.groupby(pd.TimeGrouper(freq=pd.offsets.Week(weekday=4))).last().tail(5)
Date
2016-08-19 65.860001
2016-08-26 66.220001
2016-09-02 67.489998
2016-09-09 66.650002
2016-09-16 65.820000
Freq: W-FRI, Name: Adj Close, dtype: float64
如果您希望开始日期为start
日期的下一个星期一和end
日期的上一个星期日,您可以这样做:
from datetime import datetime, timedelta
start = datetime(2015, 11, 1)
monday = start + timedelta(days=(7 - start.weekday()))
end = datetime(2016, 9, 30)
sunday = end - timedelta(days=end.weekday() + 1)
print (monday)
2015-11-02 00:00:00
print (sunday)
2016-09-25 00:00:00
然后,将其用作:
jpm = web.DataReader('JPM', 'yahoo', monday, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='7D')).last()
为了在星期天得到这一切,因为你指定的范围是星期一→星期日和星期日是考虑日期的最后一天,你可以做一个小黑客:
monday_new = monday - timedelta(days=3)
jpm = web.DataReader('JPM', 'yahoo', monday_new, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='W')).last().head()
Date
2015-11-01 62.863448
2015-11-08 66.982593
2015-11-15 64.145175
2015-11-22 66.082449
2015-11-29 65.720431
Freq: W-SUN, Name: Adj Close, dtype: float64
现在您已发布了所需的输出,您可以使用transform
方法获取结果,而不是采用聚合的last
,以便它返回一个索引大小与被分组的那个。
df = jpm.groupby(pd.TimeGrouper(freq='W')).transform('last').reset_index(name='Last')
df
df['counter'] = (df['Last'].shift() != df['Last']).astype(int).cumsum()
df.groupby(['Last','counter'])['Date'].apply(lambda x: np.array(x)[-1]) \
.reset_index().set_index('Date').sort_index()['Last']
Date
2015-11-06 66.982593
2015-11-12 64.575682
2015-11-16 65.074678
Name: Last, dtype: float64
注意:由于包含了将counter
列分别分成两个桶的 If (EnteringZoneFromStart() = true)
{OnTriggerEnter()}
Else If (ExitingZone() = true)
{OnTriggerExit()}
Else
{OnTriggerEnter()}
列,因此能够处理在两个单独日期中发生的重复条目。
答案 1 :(得分:1)
你可以这样做:
In [15]: jpm
Out[15]:
Date
2015-11-02 64.125610
2015-11-04 64.428918
2015-11-06 66.982593
2015-11-10 66.219427
2015-11-12 64.575682
2015-11-16 65.074678
Name: Adj Close, dtype: float64
In [16]: jpm.groupby(jpm.index.week).transform('last').drop_duplicates(keep='last')
Out[16]:
Date
2015-11-06 66.982593
2015-11-12 64.575682
2015-11-16 65.074678
dtype: float64
说明:
In [17]: jpm.groupby(jpm.index.week).transform('last')
Out[17]:
Date
2015-11-02 66.982593
2015-11-04 66.982593
2015-11-06 66.982593
2015-11-10 64.575682
2015-11-12 64.575682
2015-11-16 65.074678
dtype: float64
答案 2 :(得分:1)
在纯熊猫中这样做似乎有点棘手,所以我使用了numpy
import numpy as np
weekly = jpm.groupby(pd.TimeGrouper('W-SUN')).last()
weekly.index = jpm.index[np.searchsorted(jpm.index, weekly.index, side="right")-1]