Pandas从python中的昨天数据(相同的DataTime)填充缺失值

时间:2018-05-29 05:10:23

标签: python python-3.x python-2.7 pandas

我试图在python中填补昨天(一天回数据)的缺失值。同样我尝试使用以下代码,但没有获得预期的输出。

代码

import pandas as pd

df = pd.read_csv(r'input_3.csv')
saved_column = df.Value #you can also use df['column_name']


df['DateTime'] = pd.DatetimeIndex(df['DateTime'])

b = df.loc[df.Value.isnull(), 'Value'] = \
              df.loc[df.Value.isnull(), 'Value'].map(df.loc[df.Value.notnull()] \
                .set_index('DateTime')['Value'])
print b

昨天的数据:

block   DateTime    Value
1   09-01-2016 00:00    -0.886492
2   09-01-2016 01:00    -0.500995
3   09-01-2016 02:00    4
4   09-01-2016 03:00    5
5   09-01-2016 04:00    2.145205
6   09-01-2016 05:00    0.475309

今天的数据:

1   10-01-2016 00:00    -0.886492
2   10-01-2016 01:00    -0.500995
3   10-01-2016 02:00    NaN
4   10-01-2016 03:00    NaN
5   10-01-2016 04:00    2.145205
6   10-01-2016 05:00    0.475309

预计填写今天的数据:

1   10-01-2016 00:00    -0.886492
2   10-01-2016 01:00    -0.500995
3   10-01-2016 02:00    5
4   10-01-2016 03:00    2.145205
5   10-01-2016 04:00    2.145205
6   10-01-2016 05:00    0.475309

请建议我采用相同的方法。提前谢谢

我已尝试使用此帖子Fill values from one dataframe to another with matching IDs但未获得预期的输出

1 个答案:

答案 0 :(得分:4)

您可以首先将read_csvindex_colparse_datesdayfirst=True用于DatetimeIndex

df = pd.read_csv(r'input_3.csv', index_col=[1], parse_dates=[1], dayfirst=True)
print (df)
                     block     Value
DateTime                            
2016-01-09 00:00:00      1 -0.886492
2016-01-09 01:00:00      2 -0.500995
2016-01-09 02:00:00      3  4.000000
2016-01-09 03:00:00      4  5.000000
2016-01-09 04:00:00      5  2.145205
2016-01-09 05:00:00      6  0.475309
2016-01-10 00:00:00      1 -0.886492
2016-01-10 01:00:00      2 -0.500995
2016-01-10 02:00:00      3       NaN
2016-01-10 03:00:00      4       NaN
2016-01-10 04:00:00      5  2.145205
2016-01-10 05:00:00      6  0.475309

然后将NaN替换为fillna shift ed datetime一天:

df['Value'] = df['Value'].fillna(df.shift(freq='1d')['Value'])
df = df.reset_index()
print (df)
              DateTime  block     Value
0  2016-01-09 00:00:00      1 -0.886492
1  2016-01-09 01:00:00      2 -0.500995
2  2016-01-09 02:00:00      3  4.000000
3  2016-01-09 03:00:00      4  5.000000
4  2016-01-09 04:00:00      5  2.145205
5  2016-01-09 05:00:00      6  0.475309
6  2016-01-10 00:00:00      1 -0.886492
7  2016-01-10 01:00:00      2 -0.500995
8  2016-01-10 02:00:00      3  4.000000
9  2016-01-10 03:00:00      4  5.000000
10 2016-01-10 04:00:00      5  2.145205
11 2016-01-10 05:00:00      6  0.475309