我有一个看起来像这样的数据框
Code DIAG
Time
1999-12-01 00:00:01.870 None
1999-12-01 00:00:10.870 None
2000-01-01 09:10:09.870 None
2000-01-01 09:10:10.870 None
2000-01-01 09:00:10.940 None
2000-01-01 09:00:11.160 None
2000-01-01 09:00:11.640 None
2000-01-01 09:00:12.460 None
2010-01-01 09:00:34.910 1_19_1_4_0_0
2010-01-01 09:00:35.060 3_22_4_0_0_0
2010-01-01 09:00:35.120 6_22_10_3_0_0
我想在每个数据之前仅将缺失的数据回填一小时并更改标签,以便数据看起来像这样的
Code DIAG
Time
1999-12-01 00:00:01.870 None
1999-12-01 00:00:10.870 None
2000-01-01 09:10:09.870 1_19_1_4_0_0_H
2000-01-01 09:10:10.870 1_19_1_4_0_0_H
2000-01-01 09:00:10.940 1_19_1_4_0_0_H
2000-01-01 09:00:11.160 1_19_1_4_0_0_H
2000-01-01 09:00:11.640 1_19_1_4_0_0_H
2000-01-01 09:00:12.460 1_19_1_4_0_0_H
2010-01-01 09:00:34.910 1_19_1_4_0_0_H
2010-01-01 09:00:35.060 3_22_4_0_0_0
2010-01-01 09:00:35.120 6_22_10_3_0_0
我写了这段代码,它看起来像这样:
def FillData(dff):
s=dff.bfill()
s.loc[s.notnull()]=s.astype('str').astype('str')+'_H'
return s
df=A['DIAG'].groupby(pd.Grouper(freq='H')).apply(FillData)
问题在于这是产生看起来像这样的:
Code DIAG
Time
1999-12-01 00:00:01.870 None
1999-12-01 00:00:10.870 None
2000-01-01 09:10:09.870 None
2000-01-01 09:10:10.870 None
2000-01-01 09:00:10.940 None
2000-01-01 09:00:11.160 None
2000-01-01 09:00:11.640 None
2000-01-01 09:00:34.460 1_19_1_4_0_0_H
2010-01-01 09:00:34.910 1_19_1_4_0_0_H
2010-01-01 09:00:35.060 3_22_4_0_0_0_H
2010-01-01 09:00:35.120 6_22_10_3_0_0_H
我看到两个主要问题是groupby没有按H分组,只是按分钟分组。另一个问题是它正在向所有行添加标签(_H)。我的主要目标是在_H之前1小时标记数据,然后在_W之前标记1周。
我很感激,如果有人可以帮助我,我花了很多时间,但我找不到直接的方式。
由于