大熊猫选择日期超过1天的时间序列差距

时间:2017-04-28 00:44:07

标签: python pandas

我的数据框为:

df.ix[1:5]
  Date         A  
1 2010-07-26   3.15  
2 2010-07-27   5  
3 2010-07-30   3  
4 2010-07-31   105  
5 2010-08-01   0.05  
6 2010-08-02   0.05  
7 2010-08-05   0.05  

我想只选择连续日期差异超过2天的列。即最终结果应

  Date         A  
1 2010-07-27   5  
2 2010-07-30   3  
3 2010-08-02   0.05  
4 2010-08-05   0.05   

知道如何使这项工作吗?

编辑: 结果行从2010-07-27开始,因为2010-07-302010-07-27之后的第一个日期,相隔超过2天。

1 个答案:

答案 0 :(得分:2)

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

# compute time interval between every row with the last row
time_interval = pandas.Series.to_frame(df['date'] - df['date'].shift(1))
# Give the first time interval a meaningful value
time_interval['date'][0] = pandas.Timedelta('0 days')
# Define the gap
gap = pandas.Timedelta('2 days')

# get the index which satisfies the criteria
result = list(df[time_interval['date'] > gap].index)
new_result = result[:]

# insert its previous index
for i in range(len(result)):
    index = result[i]
    prev_index = index - 1
    if (prev_index >= 0) and (prev_index not in result):
        new_result.insert(new_result.index(index), prev_index)

# get desired rows by the index list
result = df.loc[new_result]
print(result)

输出

        date  value
1 2010-07-27   5.00
2 2010-07-30   3.00
5 2010-08-02   0.05
6 2010-08-05   0.05

更新

受斯科特·波士顿的启发

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

index = (df['date'] - df['date'].shift(1)).dt.days > 2

for i in range(len(index)):
    if (i > 0) and index[i]:
        index[i - 1] = True

print(df.loc[index])

再次更新

import pandas

df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])

index = (df['date'] - df['date'].shift(1)).dt.days > 2
index_prev = (df['date'] - df['date'].shift(-1)).dt.days < -2

index = (index | index_prev)

print(df.loc[index])