我正在尝试使用df.resample
计算来自特定CSV文件的传入音量的2周平均值,因此每2周跨度的情节应该是一条平线。到目前为止,每日计数工作正常,我认为我正在使用DatetimeIndex并尝试从最近的日期返回到数据集末尾的2周间隔重新采样。当我尝试
open_dt = pd.to_datetime(dsort['Date Opened']).dt.date
open_dt = open_dt.reset_index().sort_values('Date Opened').set_index('Date Opened').groupby('Date Opened').nunique()
roll_avg = open_dt.resample('2W').mean()
我收到以下错误:
Only valid with DatetimeIndex, TimeDeltaIndex or PeriodIndex, but got instance of 'Index'
我认为通过重置索引并将其设置为日期时间字段,这可以解决问题,但似乎并非如此。我也尝试初始化另一个只引入原始文件的变量,但我遇到了同样的问题。这是我的脚本的工作副本,包含损坏的roll_avg
def data_process():#sorts by domain and team
data_merge = data_extract()
domains = data_merge.groupby('PWx Domain')
for domain in domains.groups.items():
dsort = (data_merge.loc[domain[1]])
dsort.to_csv('output\\'+str(domain[0])+'.csv')
open_dt = pd.to_datetime(dsort['Date Opened']).dt.date
open_dt = open_dt.reset_index().sort_values('Date Opened').set_index('Date Opened').groupby('Date Opened').nunique()
d_avg = open_dt.mean().round(0).item()
roll_avg = open_dt.resample('2W').mean()
print(roll_avg)
fig = plt.figure()
fig.suptitle(domain[0]+' Avg='+str(d_avg), fontsize=14)
ax = plt.plot(open_dt,color='b', marker='o', linestyle='-')
ax = plt.plot(roll_avg, color = 'r', linestyle = '--')
fig.savefig('output\\'+domain[0]+'_Overall.png')
plt.close()
这是正在读入的文件的头部(data_merge)
Client # Solution Solution Family \
0 81983 Ambulatory EHR ASP Physician Practice
1 17235 Ambulatory EHR ASP Physician Practice
2 17235 Ambulatory EHR ASP Physician Practice
3 17235 Practice Management Physician Practice
4 17235 Practice Management Physician Practice
Team SR # Date Opened PWx Domain
0 PWx Mill Response ASP 416700000 6/20/2017 19:27 CPHYB_PR
1 Core T1 PWx 416700000 6/20/2017 18:33 NaN
2 Core T1 PWx 416700000 6/20/2017 18:33 CPHYB_PR
3 Claim Generation T3 PWx 416680000 6/19/2017 15:09 NaN
4 Claim Generation T3 PWx 416680000 6/19/2017 15:09 CPHYB_PR
答案 0 :(得分:1)
dt.date
个对象的索引未被识别为日期索引的类型。它有dtype('O')
。
如果您删除
roll_avg
,.dt.date
应该有效
open_dt = pd.to_datetime(dsort['Date Opened']).dt.date