重新插入并添加缺失的行

时间:2018-03-23 13:24:55

标签: python pandas dataframe

我有一个表示1秒数据的数据帧,该数据应该是100 Hz的样本。

我想 1)ephoc_as_datatime att1 att2 2000-01-01 11:22:37.130 0 4 2000-01-01 11:22:37.138 1 5 2000-01-01 11:22:37.149 2 6 2000-01-01 11:22:37.156 3 7 2000-01-01 11:22:37.165 4 8 2000-01-01 11:22:37.168 5 9 2000-01-01 11:22:37.169 3 7 2000-01-01 11:22:37.567 7 3 2000-01-01 11:22:38.120 8 4 它以10毫秒的速率与" avg"每个列的方法和2)在缺失时基于插值方法添加额外的行,如下所示:

DF_input:

ephoc_as_datatime         att1 att2
2000-01-01 11:22:37.130    0    4
2000-01-01 11:22:37.140    1    5
2000-01-01 11:22:37.150    2    6
2000-01-01 11:22:37.160    3    7
2000-01-01 11:22:37.170    4    8
....adding the missing one
2000-01-01 11:22:37.570    7    3
....adding the missing one
2000-01-01 11:22:38.120    8    4

DF_output:

resample

我知道我应该使用interpolateid | created_at | name | location | favorite_piza | drink | pet 5 | 2018-03-22 | John Doe | | | Beer | 2 | 2017-02-12 | John Doe | Earth | Hawai | | 1 | 2012-01-01 | J. Doe | | Margarita | | Dog 。 请提出任何建议。

非常感谢, 最好的祝福, 卡罗

1 个答案:

答案 0 :(得分:2)

我认为10L需要resample 10ms interpolate

#if necessary convert to datetimes
#df['ephoc_as_datatime'] = pd.to_datetime(df['ephoc_as_datatime'])

df = df.resample('10L', on='ephoc_as_datatime').mean().interpolate()
print (df.head(20))
                          att1   att2
ephoc_as_datatime                    
2000-01-01 11:22:37.130  0.500  4.500
2000-01-01 11:22:37.140  2.000  6.000
2000-01-01 11:22:37.150  3.000  7.000
2000-01-01 11:22:37.160  4.000  8.000
2000-01-01 11:22:37.170  4.075  7.875
2000-01-01 11:22:37.180  4.150  7.750
2000-01-01 11:22:37.190  4.225  7.625
2000-01-01 11:22:37.200  4.300  7.500
2000-01-01 11:22:37.210  4.375  7.375
2000-01-01 11:22:37.220  4.450  7.250
2000-01-01 11:22:37.230  4.525  7.125
2000-01-01 11:22:37.240  4.600  7.000
2000-01-01 11:22:37.250  4.675  6.875
2000-01-01 11:22:37.260  4.750  6.750
2000-01-01 11:22:37.270  4.825  6.625
2000-01-01 11:22:37.280  4.900  6.500
2000-01-01 11:22:37.290  4.975  6.375
2000-01-01 11:22:37.300  5.050  6.250
2000-01-01 11:22:37.310  5.125  6.125
2000-01-01 11:22:37.320  5.200  6.000

<强>详细

print(df.resample('10L', on='ephoc_as_datatime').mean().head(20))
                         att1  att2
ephoc_as_datatime                  
2000-01-01 11:22:37.130   0.5   4.5
2000-01-01 11:22:37.140   2.0   6.0
2000-01-01 11:22:37.150   3.0   7.0
2000-01-01 11:22:37.160   4.0   8.0
2000-01-01 11:22:37.170   NaN   NaN
2000-01-01 11:22:37.180   NaN   NaN
2000-01-01 11:22:37.190   NaN   NaN
2000-01-01 11:22:37.200   NaN   NaN
2000-01-01 11:22:37.210   NaN   NaN
2000-01-01 11:22:37.220   NaN   NaN
2000-01-01 11:22:37.230   NaN   NaN
2000-01-01 11:22:37.240   NaN   NaN
2000-01-01 11:22:37.250   NaN   NaN
2000-01-01 11:22:37.260   NaN   NaN
2000-01-01 11:22:37.270   NaN   NaN
2000-01-01 11:22:37.280   NaN   NaN
2000-01-01 11:22:37.290   NaN   NaN
2000-01-01 11:22:37.300   NaN   NaN
2000-01-01 11:22:37.310   NaN   NaN
2000-01-01 11:22:37.320   NaN   NaN