我的数据框为:
df.ix[1:5]
Date A
1 2010-07-26 3.15
2 2010-07-27 5
3 2010-07-30 3
4 2010-07-31 105
5 2010-08-01 0.05
6 2010-08-02 0.05
7 2010-08-05 0.05
我想只选择连续日期差异超过2天的列。即最终结果应
Date A
1 2010-07-27 5
2 2010-07-30 3
3 2010-08-02 0.05
4 2010-08-05 0.05
知道如何使这项工作吗?
编辑:
结果行从2010-07-27
开始,因为2010-07-30
是2010-07-27
之后的第一个日期,相隔超过2天。
答案 0 :(得分:2)
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
# compute time interval between every row with the last row
time_interval = pandas.Series.to_frame(df['date'] - df['date'].shift(1))
# Give the first time interval a meaningful value
time_interval['date'][0] = pandas.Timedelta('0 days')
# Define the gap
gap = pandas.Timedelta('2 days')
# get the index which satisfies the criteria
result = list(df[time_interval['date'] > gap].index)
new_result = result[:]
# insert its previous index
for i in range(len(result)):
index = result[i]
prev_index = index - 1
if (prev_index >= 0) and (prev_index not in result):
new_result.insert(new_result.index(index), prev_index)
# get desired rows by the index list
result = df.loc[new_result]
print(result)
date value
1 2010-07-27 5.00
2 2010-07-30 3.00
5 2010-08-02 0.05
6 2010-08-05 0.05
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
index = (df['date'] - df['date'].shift(1)).dt.days > 2
for i in range(len(index)):
if (i > 0) and index[i]:
index[i - 1] = True
print(df.loc[index])
import pandas
df = pandas.read_csv('test.csv')
df['date'] = pandas.to_datetime(df['date'])
index = (df['date'] - df['date'].shift(1)).dt.days > 2
index_prev = (df['date'] - df['date'].shift(-1)).dt.days < -2
index = (index | index_prev)
print(df.loc[index])