熊猫事件研究

时间:2017-04-04 09:03:29

标签: python pandas

假设我有一个时间序列:

pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20))

由此给出,

1990-01-01    0.018363
1990-01-02    0.288625
1990-01-03    0.460708
1990-01-04    0.663063
1990-01-05    0.434250
1990-01-06    0.504893
1990-01-07    0.587743
1990-01-08    0.412223
1990-01-09    0.604656
1990-01-10    0.960338
1990-01-11    0.606765
1990-01-12    0.110480
1990-01-13    0.671683
1990-01-14    0.178488
1990-01-15    0.458074
1990-01-16    0.219303
1990-01-17    0.172665
1990-01-18    0.429534
1990-01-19    0.505891
1990-01-20    0.242567
Freq: D, dtype: float64

假设事件日期是1990-01-05和1990-01-15。我希望将数据子集化为一个长度为(-2,+ 2)的窗口围绕事件,如下所示:

1990-01-03    0.460708
1990-01-04    0.663063
1990-01-05    0.434250
1990-01-06    0.504893
1990-01-07    0.587743
1990-01-13    0.671683
1990-01-14    0.178488
1990-01-15    0.458074
1990-01-16    0.219303
1990-01-17    0.172665
Freq: D, dtype: float64

我该怎么做呢?

3 个答案:

答案 0 :(得分:1)

我认为您可以使用Series创建的concat list comprehension loc来填充{{3}}:

date1 = pd.to_datetime('1990-01-05')
date2 = pd.to_datetime('1990-01-15')
window = 2

dates = [date1, date2]

s1 = pd.concat([s.loc[date - pd.Timedelta(window, unit='d'): 
                      date + pd.Timedelta(window, unit='d')] for date in dates])
print (s1)
1990-01-03    0.284356
1990-01-04    0.997019
1990-01-05    0.293225
1990-01-06    0.451379
1990-01-07    0.743209
1990-01-13    0.254926
1990-01-14    0.339728
1990-01-15    0.793124
1990-01-16    0.121002
1990-01-17    0.930924
dtype: float64

答案 1 :(得分:1)

试试这个:

In [23]: df['A']
Out[23]: 
2013-01-01    0.469112
2013-01-02    1.212112
2013-01-03   -0.861849
2013-01-04    0.721555
2013-01-05   -0.424972
2013-01-06   -0.673690
Freq: D, Name: A, dtype: float64

In [25]: df['20130102':'20130104']
Out[25]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860

[3 rows x 4 columns]

来自食谱:http://pandas.pydata.org/pandas-docs/version/0.13.1/10min.html?highlight=select%20where("选择"项目)

答案 2 :(得分:1)

我会构建一个布尔掩码来选择有趣的值:

import numpy as np
import pandas as pd

s = pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20))
events = [pd.to_datetime('1990-01-05'), pd.to_datetime('1990-01-15')]
max_delta = pd.Timedelta(2, unit='d')

mask = np.zeros_like(s, dtype=bool)
for event in events:
    mask |= np.abs(s.index - event) <= max_delta
s_events = s[mask]

print(s_events)

输出:

1990-01-03    0.877271
1990-01-04    0.770214
1990-01-05    0.427380
1990-01-06    0.971676
1990-01-07    0.533582
1990-01-13    0.060556
1990-01-14    0.932072
1990-01-15    0.501966
1990-01-16    0.081177
1990-01-17    0.167775
dtype: float64