Question

我每小时有大约10年的熊猫数据系列。有时，数据会丢失2-3个月。我想填补缺失的时期。我想到的过程如下。

根据可用数据创建一年的每小时时间序列通过每天和每小时的平均值计算得出。
填充该平均时间序列中的缺失值。
例如，如果缺少2009/01/28 1:00 pm，它将从第一步中计算出的时间序列中查找01/28 1:00 pm并将其填充。

我尝试了很多搜索，但无法完成此任务。

任何帮助将不胜感激。

编辑： 到目前为止，这是我的尝试。我还在测试中。不过要花很长时间。

for count in dfaverage.index:
    currentday = count.day
    currentmonth = count.month
    currenthour = count.hour
    match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
    #print(match_timestamp)
    value = df.loc[df.index.strftime('%m-%d %H') == match_timestamp].mean()
    dfaverage.loc[count]['value'] = value

for count in df.index:
    if math.isnan(df.loc[count]):
    currentday = count.day
    currentmonth = count.month
    currenthour = count.hour    
    match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
    value = dfaverage.loc[dfaverage.index.strftime('%m-%d %H') == match_timestamp].mean()
    df.at[count, 'AtmPressurekPa'] = value

现在，我要遍历此空数据系列的每个元素，从主数据帧（df）中找到相应的日，月和小时值，对其取平均值，然后分配给该时间序列。

稍后，我将使用dfaverage时间序列填充df时间序列中的缺失值。

通过制作平均时间序列来填充时间序列中的缺失数据

0 个答案: