在Python数据框中按年度日期填补空白的最佳方法

时间:2018-09-26 22:25:11

标签: python pandas date dataframe

所有,我是Python的新手,因此遇到了以下问题。我有一个DF:

ipdb> DF

    asofdate  port_id
1 2010-01-01       76
2 2010-04-01       43
3 2011-02-01       76
4 2013-01-02       93
5 2017-02-01       43

对于年度差距,例如2012、2014、2015和2016,我想使用每个缺失年份的新年日期和上一年的port_id来填充差距。理想情况下,我想要:

ipdb> DF

    asofdate  port_id
1 2010-01-01       76
2 2010-04-01       43
3 2011-02-01       76
4 2012-01-01       76
5 2013-01-02       93
6 2014-01-01       93
7 2015-01-01       93
8 2016-01-01       93
9 2017-02-01       43

我尝试了多种方法,但仍然无济于事。某些专家能为我提供一些解决方法的建议吗?提前谢谢!

2 个答案:

答案 0 :(得分:1)

您可以将set.differencerange一起使用以查找缺少的日期,然后附加一个数据框:

# convert to datetime if not already converted
df['asofdate'] = pd.to_datetime(df['asofdate'])

# calculate missing years
years = df['asofdate'].dt.year
missing = set(range(years.min(), years.max())) - set(years)

# append dataframe, sort and front-fill
df = df.append(pd.DataFrame({'asofdate': pd.to_datetime(list(missing), format='%Y')}))\
       .sort_values('asofdate')\
       .ffill()

print(df)

    asofdate  port_id
1 2010-01-01     76.0
2 2010-04-01     43.0
3 2011-02-01     76.0
1 2012-01-01     76.0
4 2013-01-02     93.0
2 2014-01-01     93.0
3 2015-01-01     93.0
0 2016-01-01     93.0
5 2017-02-01     43.0

答案 1 :(得分:0)

我将创建一个helper数据框,其中包含所有年份的开始日期,然后过滤出与df中的年份匹配的年份,最后将它们合并在一起:

# First make sure it is proper datetime
df['asofdate'] = pd.to_datetime(df.asofdate)

# Create your temporary dataframe of year start dates
helper = pd.DataFrame({'asofdate':pd.date_range(df.asofdate.min(), df.asofdate.max(), freq='YS')})

# Filter out the rows where the year is already in df
helper = helper[~helper.asofdate.dt.year.isin(df.asofdate.dt.year)]

# Merge back in to df, sort, and forward fill
new_df = df.merge(helper, how='outer').sort_values('asofdate').ffill()

>>> new_df
    asofdate  port_id
0 2010-01-01     76.0
1 2010-04-01     43.0
2 2011-02-01     76.0
5 2012-01-01     76.0
3 2013-01-02     93.0
6 2014-01-01     93.0
7 2015-01-01     93.0
8 2016-01-01     93.0
4 2017-02-01     43.0