Pandas - 在时间间隔上截断数据帧

时间:2016-05-31 13:53:09

标签: python pandas dataframe

我想保留最后几行,但是一旦有超过100ms的时间间隔,就切断数据帧的其余部分。例如:

输入:

           Time  X
0   12:30:00.00  A
1  12:30:00.100  B
2  12:30:00.202  C
3  12:30.00.300  D

输出

           Time  X
2  12:30:00.202  C
3  12:30.00.300  D

说明:行B和C之间的距离超过100毫秒,所以我们扔掉C行上面的所有内容。

1 个答案:

答案 0 :(得分:2)

您可以diff使用to_timedeltacumsum进行比较,然后使用boolean indexingTimedelta进行比较。上次使用{{3}}:

1

如果需要列df['Time']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f') print (df) Time X 0 1900-01-01 12:30:00.000 A 1 1900-01-01 12:30:00.100 B 2 1900-01-01 12:30:00.202 C 3 1900-01-01 12:30:00.300 D print (df.Time.diff()) 0 NaT 1 00:00:00.100000 2 00:00:00.102000 3 00:00:00.098000 Name: Time, dtype: timedelta64[ns] mask = (((df.Time.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1) print (mask) 0 False 1 False 2 True 3 True Name: Time, dtype: bool print (df[mask]) Time X 2 1900-01-01 12:30:00.202 C 3 1900-01-01 12:30:00.300 D 未更改,则将第一个值拆分为Time

100ms

如果需要按最后一个值分割:

df['Time1']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
print (df)
           Time  X                   Time1
0   12:30:00.00  A 1900-01-01 12:30:00.000
1  12:30:00.100  B 1900-01-01 12:30:00.100
2  12:30:00.202  C 1900-01-01 12:30:00.202
3  12:30:00.300  D 1900-01-01 12:30:00.300
1  12:30:00.100  E 1900-01-01 12:30:00.100
2  12:30:00.202  F 1900-01-01 12:30:00.202

print (df.Time1.diff())
0                        NaT
1            00:00:00.100000
2            00:00:00.102000
3            00:00:00.098000
1   -1 days +23:59:59.800000
2            00:00:00.102000
Name: Time1, dtype: timedelta64[ns]

mask = (((df.Time1.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1)
print (mask)
0    False
1    False
2     True
3     True
1     True
2     True
Name: Time1, dtype: bool

print (df[mask].drop('Time1',axis=1))
           Time  X
2  12:30:00.202  C
3  12:30:00.300  D
1  12:30:00.100  E
2  12:30:00.202  F
print (df)
           Time  X
0   12:30:00.00  A
1  12:30:00.100  B
2  12:30:00.202  C
3  12:30:00.300  D
1  12:30:00.100  E
2  12:30:00.202  F

#create helper series
time_ser= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
#get differences
print (time_ser.diff())
0                        NaT
1            00:00:00.100000
2            00:00:00.102000
3            00:00:00.098000
1   -1 days +23:59:59.800000
2            00:00:00.102000
Name: Time, dtype: timedelta64[ns]