使用resample计算2周的平均计数

时间:2017-06-27 22:57:42

标签: python pandas resampling

我正在尝试使用df.resample计算来自特定CSV文件的传入音量的2周平均值,因此每2周跨度的情节应该是一条平线。到目前为止,每日计数工作正常,我认为我正在使用DatetimeIndex并尝试从最近的日期返回到数据集末尾的2周间隔重新采样。当我尝试

open_dt = pd.to_datetime(dsort['Date Opened']).dt.date open_dt = open_dt.reset_index().sort_values('Date Opened').set_index('Date Opened').groupby('Date Opened').nunique() roll_avg = open_dt.resample('2W').mean() 我收到以下错误:

Only valid with DatetimeIndex, TimeDeltaIndex or PeriodIndex, but got instance of 'Index' 

我认为通过重置索引并将其设置为日期时间字段,这可以解决问题,但似乎并非如此。我也尝试初始化另一个只引入原始文件的变量,但我遇到了同样的问题。这是我的脚本的工作副本,包含损坏的roll_avg

def data_process():#sorts by domain and team
data_merge = data_extract()
domains  = data_merge.groupby('PWx Domain')
for domain in domains.groups.items():
    dsort = (data_merge.loc[domain[1]])
    dsort.to_csv('output\\'+str(domain[0])+'.csv')
    open_dt = pd.to_datetime(dsort['Date Opened']).dt.date
    open_dt = open_dt.reset_index().sort_values('Date Opened').set_index('Date Opened').groupby('Date Opened').nunique()
    d_avg = open_dt.mean().round(0).item()
    roll_avg = open_dt.resample('2W').mean()
    print(roll_avg)
    fig = plt.figure()
    fig.suptitle(domain[0]+' Avg='+str(d_avg), fontsize=14)
    ax = plt.plot(open_dt,color='b', marker='o', linestyle='-') 
    ax = plt.plot(roll_avg, color = 'r', linestyle = '--') 
    fig.savefig('output\\'+domain[0]+'_Overall.png')
    plt.close()

这是正在读入的文件的头部(data_merge)

       Client #                       Solution     Solution Family  \
0     81983  Ambulatory EHR ASP  Physician Practice
1     17235  Ambulatory EHR ASP  Physician Practice
2     17235  Ambulatory EHR ASP  Physician Practice
3     17235     Practice Management  Physician Practice
4     17235     Practice Management  Physician Practice

                      Team       SR #      Date Opened PWx Domain
0    PWx Mill Response ASP  416700000  6/20/2017 19:27   CPHYB_PR
1              Core T1 PWx  416700000  6/20/2017 18:33        NaN
2              Core T1 PWx  416700000  6/20/2017 18:33   CPHYB_PR
3  Claim Generation T3 PWx  416680000  6/19/2017 15:09        NaN
4  Claim Generation T3 PWx  416680000  6/19/2017 15:09   CPHYB_PR

1 个答案:

答案 0 :(得分:1)

dt.date个对象的索引未被识别为日期索引的类型。它有dtype('O')。 如果您删除

中的roll_avg.dt.date应该有效
open_dt = pd.to_datetime(dsort['Date Opened']).dt.date