我有一个原始数据集,其中包含有关其开始日期,结束日期,总价值的数据的项目。我通过将总价值除以天数来计算出日价值(dagwaarde)。
我想用数据1-1-2018到1-1-2020填充数据集,其中包含每天所有项目的全天值之和。
除了此作品外,所有作品均有效:
df_date_range = pd.date_range(begin,einde)
出现此错误的: ValueError:NaTType不支持时间
这是我使用的代码:
#Original DF with data about projects: startdatum (start), einddatum (end), dagwaarde (day value).
#Day value is total value ('Value') / amount of days
Pipedrive['einddatum'] = pd.to_datetime(Pipedrive['einddatum'])
Pipedrive['startdatum'] = pd.to_datetime(Pipedrive['startdatum'])
Pipedrive['Days'] = Pipedrive['einddatum'].sub(Pipedrive['startdatum'], axis =0)
Pipedrive.head()
Pipedrive['Days'] = Pipedrive['Days'] / np.timedelta64(1, 'D')
Pipedrive['dagwaarde'] = Pipedrive['Value'] / Pipedrive['Days']
#Create DF to work with
Pipedrive_IN = Pipedrive[["stage_order_nr","dagwaarde",'einddatum', 'startdatum', 'Days' ]]
#make a list of all begin and end dates you want to have filled
begin = '2018-01-01' # start date
einde = '2020-01-01' # end date
#make a DF with a timedate index
datetimeindex = pd.date_range(begin,einde)
df_dates = pd.DataFrame(datetimeindex, columns=['date'])
df_dates = df_dates.set_index('date')
df_dates = df_dates.fillna(0)
for index, value in Pipedrive_IN.iterrows():
begin = value.startdatum # start date
einde = value.einddatum # end date
dagwaarde = value.dagwaarde # dagwaarde
#make DF with timedate index
df_date_range = pd.date_range(begin,einde)
df_proj = pd.DataFrame(df_date_range, columns=['date'])
df_proj['dagwaarde'] = dagwaarde
df_proj = df_proj.set_index('date')
df_proj=df_proj.dropna()
print(df_proj.head())
#add original DF to df_dates
df_dates = df_dates.join(df_proj,lsuffix='', rsuffix=index)
df_dates = df_dates.fillna(0)
print(df_dates.head(20))
#print result
df_dates['total']=df_dates.sum(axis=1)
print(df_dates.head(50))