回填大熊猫数据帧错过了第一个月

时间:2017-11-21 06:16:40

标签: pandas datetime dataframe

我有一个大熊猫df或灌溉需求数据,其每日值从1900到2099.我重新采样df以获得月平均值,然后重新采样并按日平均回填月平均值,以便平均每日平均值每个月,输入该月每天的每日价值。

我的问题是第一个月没有回填,那个月的最后一天只有一个值(1900-01-31)。

这是我的代码,对我做错了什么建议?

I2 = pd.DataFrame(IrrigDemand, columns = ['Year', 'Month', 'Day', 'IrrigArea_1', 'IrrigArea_2','IrrigArea_3','IrrigArea_4','IrrigArea_5'],dtype=float)  

# set dates as index 
I2.set_index('Year')   

# make a column of dates in datetime format
dates = pd.to_datetime(I2[['Year', 'Month', 'Day']])       

# add the column of dates to df
I2['dates'] = pd.Series(dates, index=I2.index) 

# set dates as index of df
I2.set_index('dates')                                                    

# delete the three string columns replaced with datetime values
I2.drop(['Year', 'Month', 'Day'],inplace=True,axis=1)    

# calculate the average daily value for each month 
I2_monthly_average = I2.reset_index().set_index('dates').resample('m').mean()                           
I2_daily_average = I2_monthly_average.resample('d').bfill()  

1 个答案:

答案 0 :(得分:1)

问题首先day未添加resample('m'),因此必须手动添加:

# make a column of dates in datetime format and assign to index
I2.index = pd.to_datetime(I2[['Year', 'Month', 'Day']])       

# delete the three string columns replaced with datetime values
I2.drop(['Year', 'Month', 'Day'],inplace=True,axis=1)    

# calculate the average daily value for each month 
I2_monthly_average = I2.resample('m').mean()   

first_day = I2_monthly_average.index[0].replace(day = 1)
I2_monthly_average.loc[first_day] = I2_monthly_average.iloc[0]

I2_daily_average = I2_monthly_average.resample('d').bfill()                       

样品:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
I2 = pd.DataFrame({'a': range(10)}, index=rng)  
print (I2)
            a
2017-04-03  0
2017-04-23  1
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9
I2_monthly_average = I2.resample('m').mean()
print (I2_monthly_average)
              a
2017-04-30  0.5
2017-05-31  2.0
2017-06-30  3.5
2017-07-31  5.0
2017-08-31  6.5
2017-09-30  8.5

first_day = I2_monthly_average.index[0].replace(day = 1)
I2_monthly_average.loc[first_day] = I2_monthly_average.iloc[0]
print (I2_monthly_average)
              a
2017-04-30  0.5
2017-05-31  2.0
2017-06-30  3.5
2017-07-31  5.0
2017-08-31  6.5
2017-09-30  8.5
2017-04-01  0.5 <- added first day

I2_daily_average = I2_monthly_average.resample('d').bfill()
print (I2_daily_average.head())
              a
2017-04-01  0.5
2017-04-02  0.5
2017-04-03  0.5
2017-04-04  0.5
2017-04-05  0.5