循环遍历pandas datetime对象中的日期并比较时差

时间:2018-01-15 12:45:23

标签: python pandas datetime timestamp

您好我想循环查看从Excel文件中获取的日期时间列表,并检查当前迭代和先前迭代之间的时差是否> 10分钟,如果是这样,那么当前迭代应该是前一次迭代+10分钟。这是我得到的日期列表,我希望索引4是索引3 + 10分钟而不是NaT ex。

0    2014-11-01 00:00:00
1    2014-11-01 00:10:00
2    2014-11-01 00:20:00
3    2014-11-01 00:30:00
4                    NaT
5    2014-11-01 00:50:00
6    2014-11-01 01:00:00
7    2014-11-01 01:10:00
8    2014-11-01 01:20:00
9    2014-11-01 01:30:00
10   2014-11-01 01:40:00
11   2014-11-01 01:50:00
12   2014-11-01 02:00:00
13                   NaT
14   2014-11-01 02:20:00
15   2014-11-01 02:30:00
16   2014-11-01 02:40:00
17   2014-11-01 02:50:00
18   2014-11-01 03:00:00

名称:时间戳,dtype:datetime64 [ns]

import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import time
import datetime

os.chdir('C:\Users\NIK\.spyder2\PythonScripts')
file = 'FilterDataTest.xlsx'

data = pd.read_excel(file, sheetname='Ark1')

dato = data['Timestamp']

for i in range(0,len(dato)):
if dato[i].minute - dato[i-1].minute > 10:
    dato_old = dato[i-1]
    dato[i] = dato_old + minute(10) 

这是我到目前为止所做的代码,我知道它不会起作用,尤其是旧值+分钟的最后一部分(10)这只是为了突出我想要做的事情。

1 个答案:

答案 0 :(得分:0)

这应该可以使用diffTimedeltaindexing

data.Timestamp = pd.to_datetime(data.Timestamp)
timediff = pd.Timedelta(minutes=10) #Specify the ammount of time you want to add
#diff() here returns a series where the value for 
#each row is the difference between this row and the previous row
mask = data.Timestamp.diff() > timediff 

#if there are more than 1 consecutive +10 mins different 
#the latter value will update on the updated previous values  
while mask.any(): 
#The following is just one way for assigning the values which has 
#+10 min difference to the previous value + 10 mins 
    data.loc[mask,'Timestamp'] = (data.iloc[data.loc[mask].index-1].Timestamp + timediff).values
#update the mask
    mask = data.Timestamp.diff() > timediff