Question

我正在尝试使用时间序列学习物联网数据。数据来自两个不同的来源。在某些度量中，源之间的差异非常小：一个源有11行，第二个源有15行。在其他度量中，一个来源有30行，第二个来源有240行。

应该使用以下方式进行插值：

 df.resample('20ms').interpolate()

但是播种它会删除一些行。有没有不删除就可以插值的方法，还是应该删除行？

编辑-数据和代码：

#!/usr/bin/env python3.6
import pandas as pd
import sklearn.preprocessing
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
first_df_file_name='interpolate_test.in'
df = read_csv(first_df_file_name, header=0, squeeze=True, delimiter=' ')
print(df.head(5))
idx=0
new_col = pd.date_range('1/1/2011 00:00:00.000000', periods=len(df.index), freq='100ms')
df.insert(loc=idx, column='date', value=new_col)
df.set_index('date', inplace=True)
upsampled = df.resample('20ms').interpolate()
print('20 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_20ms.out')
upsampled = df.resample('60ms').interpolate()
print('60 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_60ms.out')

这是测试输入文件名：

这里是输出（部分）

 //output of interpolating by 20 milis - this is fine
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.020  120.0  240.0
 2011-01-01 00:00:00.040  140.0  280.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.080  180.0  360.0
 60 ms, num rows 16

 //output when interpolating by 60 milis - data is lost
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.120  220.0  440.0
 2011-01-01 00:00:00.180  280.0  560.0
 2011-01-01 00:00:00.240  340.0  680.0

那么，我应该从最大的源中删除行而不是进行插值吗？如果要插值，如何避免丢失数据？

熊猫：删除行或内插

0 个答案: