每小时/每周向后填充缺失值Pandas Dataframe

时间:2018-03-01 20:40:02

标签: python pandas dataframe aggregate pandas-groupby

我有一个看起来像这样的数据框

Code                   DIAG
Time
1999-12-01 00:00:01.870     None
1999-12-01 00:00:10.870     None
2000-01-01 09:10:09.870    None
2000-01-01 09:10:10.870    None
2000-01-01 09:00:10.940    None
2000-01-01 09:00:11.160    None
2000-01-01 09:00:11.640    None
2000-01-01 09:00:12.460    None
2010-01-01 09:00:34.910    1_19_1_4_0_0
2010-01-01 09:00:35.060    3_22_4_0_0_0
2010-01-01 09:00:35.120    6_22_10_3_0_0

我想在每个数据之前仅将缺失的数据回填一小时并更改标签,以便数据看起来像这样的

Code                             DIAG
    Time
    1999-12-01 00:00:01.870     None
    1999-12-01 00:00:10.870     None
    2000-01-01 09:10:09.870    1_19_1_4_0_0_H
    2000-01-01 09:10:10.870    1_19_1_4_0_0_H
    2000-01-01 09:00:10.940    1_19_1_4_0_0_H
    2000-01-01 09:00:11.160    1_19_1_4_0_0_H
    2000-01-01 09:00:11.640    1_19_1_4_0_0_H
    2000-01-01 09:00:12.460    1_19_1_4_0_0_H
    2010-01-01 09:00:34.910    1_19_1_4_0_0_H
    2010-01-01 09:00:35.060    3_22_4_0_0_0
    2010-01-01 09:00:35.120    6_22_10_3_0_0

我写了这段代码,它看起来像这样:

def FillData(dff):
        s=dff.bfill()
        s.loc[s.notnull()]=s.astype('str').astype('str')+'_H'
        return s

    df=A['DIAG'].groupby(pd.Grouper(freq='H')).apply(FillData)

问题在于这是产生看起来像这样的:

Code                             DIAG
    Time
    1999-12-01 00:00:01.870     None
    1999-12-01 00:00:10.870     None
    2000-01-01 09:10:09.870    None
    2000-01-01 09:10:10.870    None
    2000-01-01 09:00:10.940    None
    2000-01-01 09:00:11.160    None
    2000-01-01 09:00:11.640    None
    2000-01-01 09:00:34.460    1_19_1_4_0_0_H
    2010-01-01 09:00:34.910    1_19_1_4_0_0_H
    2010-01-01 09:00:35.060    3_22_4_0_0_0_H
    2010-01-01 09:00:35.120    6_22_10_3_0_0_H

我看到两个主要问题是groupby没有按H分组,只是按分钟分组。另一个问题是它正在向所有行添加标签(_H)。我的主要目标是在_H之前1小时标记数据,然后在_W之前标记1周。

我很感激,如果有人可以帮助我,我花了很多时间,但我找不到直接的方式。

由于

0 个答案:

没有答案