我对数据的累积总和从198倍提高到2016年,现在的格式为:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
实际上是从1996年到2016年跳过的数据(对于TN,但各州之间有所不同)。我需要找到一种方法来普遍填充数据中所有缺失的空缺,因为有些年份不存在(2010-2015年),并且我想填充它们,以便输出一直到2018年。
我希望缺失的值被之前的值所填充,以得到如下所示的输出:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
答案 0 :(得分:0)
您可以创建一个缺少月份的数据框,然后将其与您的结果合并:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
然后您可以创建所有月份的数据框:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
然后将其与原始数据框合并并向前填充:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
另一种解决方案是按日期索引,然后在那里重新采样:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
然后您可以按照以下步骤按月重新采样:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
您可以通过执行以下操作获得原始结构:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
答案 1 :(得分:0)