我必须使用熊猫对以下时间序列进行下采样:
Time ch1 ,ch2 ,ch3 ,ch4
09/19/2019 22:00:00.000000000 ,0 ,0 ,8675601 ,0
09/19/2019 22:00:00.000976562 ,8028645 ,0 ,8662525 ,7706467
09/19/2019 22:00:00.001953124 ,8027705 ,0 ,0 ,7704373
09/19/2019 22:00:00.001953125 ,0 ,0 ,8685515 ,0
09/19/2019 22:00:00.002929687 ,8028089 ,0 ,8699659 ,7704202
09/19/2019 22:00:00.003906249 ,8027918 ,0 ,0 ,7705569
09/19/2019 22:00:00.003906250 ,0 ,0 ,8703334 ,0
由于csv文件很大,因此我使用以下代码读取每个块,然后尝试将其缩减大小:
def resampleSignal(inPath,separator,firstDataLine,chunkSize):
columsDataFrame=[]
tempIndex=0
for chunk in tqdm(pd.read_csv(inPath,skiprows=range(0,firstDataLine),chunksize=chunkSize,sep=separator)):
columsDataFrame=chunk.columns
chunk.index = pd.to_datetime(chunk.index, unit='ns')
resampled = pd.DataFrame()
resampled=chunk.resample('1S').last()
resampled_np=(resampled).values
if tempIndex==0:
finalDataSet=np.array(resampled_np)
else:
finalDataSet=np.append(finalDataSet,np.array(resampled_np),axis=0)
tempIndex+=1
return finalDataSet
问题在于,无论我是否更改参数'1S'
,输出始终为:
Time ,ch1 ,ch2 ,ch3 ,ch4
09/19/2019 12:03:21.906250000 ,8471473.0 ,5633804.0 ,8578007.0 ,7515027.0
09/19/2019 12:16:20.657226562 ,8463397.0 ,5616594.0 ,8582878.0 ,7536395.0
09/19/2019 12:28:45.581054687 ,7711094.0 ,0.0 ,16777215.0 ,7773021.0
09/19/2019 12:41:04.551757812 ,7690984.0 ,5697459.0 ,16777215.0 ,7795462.0
基本上,它总是获取块的最后一行,而不是每秒获取一行。
我当前的熊猫版本为0.25.3,如果我打印重采样器对象chunk.resample('1S')
,我将得到以下输出:
DatetimeIndexResampler [freq=<Second>, axis=0, closed=left, label=left, convention=start, base=0]
所以我知道它在使用右轴。 我究竟做错了什么?预先感谢您的帮助!
答案 0 :(得分:1)
好吧,我意识到自己的错误,我试图对错误的chunk.index
进行采样,我将代码修改如下:
def resampleSignal(inPath,separator,firstDataLine,chunkSize):
columsDataFrame=[]
tempIndex=0
for chunk in tqdm(pd.read_csv(inPath,skiprows=range(0,firstDataLine),chunksize=chunkSize,sep=separator)):
columsDataFrame=chunk.columns
chunk.Time = pd.Index(pd.to_datetime(chunk.Time, unit='ns'))
resampled = pd.DataFrame()
resampled=chunk.resample('100L', on='Time').last()
resampled_np=(resampled).values
if tempIndex==0:
finalDataSet=np.array(resampled_np)
else:
finalDataSet=np.append(finalDataSet,np.array(resampled_np),axis=0)
tempIndex+=1
return finalDataSet