我有一个Panadas DataFrame,其中包含各种操作的测量值(Val)(OpID)以及测量发生时的时间戳(OpTime)。
OpID OpTime Val
143 2014-01-01 02:35:02 20
143 2014-01-01 02:40:01 24
143 2014-01-01 02:40:03 0
143 2014-01-01 02:45:01 0
143 2014-01-01 02:50:01 20
143 2014-01-01 02:55:01 0
143 2014-01-01 03:00:01 20
143 2014-01-01 03:05:01 24
143 2014-01-01 03:10:01 20
212 2014-01-01 02:15:01 20
212 2014-01-01 02:17:02 0
212 2014-01-01 02:20:01 0
212 2014-01-01 02:25:01 0
212 2014-01-01 02:30:01 20
299 2014-01-01 03:30:03 33
299 2014-01-01 03:35:02 33
299 2014-01-01 03:40:01 34
299 2014-01-01 03:45:01 33
299 2014-01-01 03:45:02 34
我希望生成一个输出,该输出仅显示测量返回零的时间段。 “开始”列将显示一系列OpTime中的第一个导致零,而End将包含第一个Val非零的OpTime。
鉴于以上样本数据,我想要的结果是:
OpID Start End
143 2014-01-01 02:40:03 2014-01-01 02:50:01
143 2014-01-01 02:55:01 2014-01-01 03:00:01
212 2014-01-01 02:17:02 2014-01-01 02:30:01
答案 0 :(得分:1)
如何使用pivot()
?
import numpy as np
import pandas as pd
df['Zeros'] = (df['Val']==0)
df['Valdf'] = np.hstack((nan, diff(df['Zeros'].values))) #how do you treat the first cell?
df['Valdr'] = np.hstack((diff(df['Zeros'].values), nan)) #how do you treat the last cell?
pf2 = pd.concat([df[((df['Zeros']!=True)&(df['Valdf']==1))],df[((df['Zeros'])&(df['Valdr']==1))]]).sort_index()
pf2['State'] = np.where(pf2['Zeros'], 'Start', 'End')
pf2['idx2'] = pf2['OpID'] + np.arange(len(pf2))/2*0.00000001 #need unique index for .pivot()
print pf2.pivot(index='idx2', columns='State', values='OpTime')
State End Start
idx2
143.0 2014-01-01 02:50:01 2014-01-01 02:45:01
143.1 2014-01-01 03:00:01 2014-01-01 02:55:01
212.2 2014-01-01 02:30:01 2014-01-01 02:25:01
要获得你提出的那个,可能只是:
pf3 = pd.DataFrame(pf2.pivot(index='idx2', columns='State', values='OpTime'))
pf3.index=asarray(pf3.index.values, 'int')