I'm a bit stuck on this one. I have a dataframe
that has samples of a variable, each with a timestamp. The data are sorted in order of increasing time:
import pandas as pd
dates = [#Continuous Block
pd.Timestamp('2012-05-03 09:00:01'),
pd.Timestamp('2012-05-03 09:00:02'),
pd.Timestamp('2012-05-03 09:00:03'),
pd.Timestamp('2012-05-03 09:00:04'),
#Non Continuous Block
pd.Timestamp('2012-05-03 16:00:00'),
pd.Timestamp('2012-05-03 17:00:04'),
#Continuous Block
pd.Timestamp('2012-05-03 18:00:01'),
pd.Timestamp('2012-05-03 18:00:02'),
pd.Timestamp('2012-05-03 18:00:03'),
#Non Continuous Block
pd.Timestamp('2012-05-03 19:00:03')]
vars = [-0.105, -1.08, -1.08, -1.03, -1.0, -1.1, -0.15,-0.14,-0.13,-0.11]
df = pd.DataFrame({'A' : vars}, index=dates)
This gives:
A
2012-05-03 09:00:01 -0.105
2012-05-03 09:00:02 -1.080
2012-05-03 09:00:03 -1.080
2012-05-03 09:00:04 -1.030
2012-05-03 16:00:00 -1.000
2012-05-03 17:00:04 -1.100
2012-05-03 18:00:01 -0.150
2012-05-03 18:00:02 -0.140
2012-05-03 18:00:03 -0.130
2012-05-03 19:00:03 -0.110
As you can see there are often successive entries that are separated by one second. I want to pull out the lowest value of A within a set of timestamps that are separated by 1 second. So within the above example, running a function should give:
2012-05-03 09:00:03, -1.080
2012-05-03 16:00:00, -1.000
2012-05-03 17:00:04, -1.100
2012-05-03 18:00:01, -0.150
2012-05-03 19:00:03, -0.110
Appreciate any help!
答案 0 :(得分:1)
我通过创建一个名为' Time'
df['Time'] = df.index
df2 = df.groupby([df.index.hour]).apply(lambda x: x.min())
df2.reset_index(drop = True,inplace='True')
print df2.head()
给出:
A Time
0 -1.08 2012-05-03 09:00:01
1 -1.00 2012-05-03 16:00:00
2 -1.10 2012-05-03 17:00:04
3 -0.15 2012-05-03 18:00:01
4 -0.11 2012-05-03 19:00:03
如果您只需要按小时分组,则不需要时间列,您需要按TimeStamp进行分组:
df2 = df.groupby([df.index.hour]).apply(lambda x: x.min())
print df2.head()
输出为:
A
9 -1.08
16 -1.00
17 -1.10
18 -0.15
19 -0.11