当我试图将两个数据集连接(或合并/合并)时,我遇到了一些麻烦的代码,我得到了TypeError: Cannot compare type 'Timestamp' with type 'int'
两个数据集都来自对相同的初始起始数据集进行重新采样。 master_hrs
df是使用称为中断器的变更点算法Python软件包进行的重采样过程。 (pip install ruptures
运行代码)。 daily_summary
df仅使用Pandas重新采样每日均值和总和值。但是,当我尝试将数据集组合在一起时出现错误。有人会尝试吗?
组成一些虚假数据会产生与我的真实数据集相同的错误。我认为我要解决的问题是,我正在尝试将datime与numpy进行比较。谢谢
import ruptures as rpt
import calendar
import numpy as np
import pandas as pd
np.random.seed(11)
rows,cols = 50000,2
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='H')
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)
def changPointDf(df):
arr = np.array(df.Value)
#Define Binary Segmentation search method
model = "l2"
algo = rpt.Binseg(model=model).fit(arr)
my_bkps = algo.predict(n_bkps=5)
# getting the timestamps of the change points
bkps_timestamps = df.iloc[[0] + my_bkps[:-1] +[-1]].index
# computing the durations between change points
durations = (bkps_timestamps[1:] - bkps_timestamps[:-1])
#hours calc
d = durations.seconds/60/60
d_f = pd.DataFrame(d)
df2 = d_f.T
return df2
master_hrs = pd.DataFrame()
for idx, days in df.groupby(df.index.date):
changPoint_df = changPointDf(days)
values = changPoint_df.values.tolist()
master_hrs=master_hrs.append(values)
master_hrs.columns = ['overnight_AM_hrs', 'moring_startup_hrs', 'moring_ramp_hrs', 'high_load_hrs', 'evening_shoulder_hrs']
daily_summary = pd.DataFrame()
daily_summary['Temperature'] = df['Temperature'].resample('D').mean()
daily_summary['Value'] = df['Value'].resample('D').sum()
final_df = daily_summary.join(master_hrs)
答案 0 :(得分:0)
问题是索引本身-master_hrs
是int64,而daily_summary
是日期时间。在将两个数据框连接在一起之前,请包括以下内容:
master_hrs.index = pd.to_datetime(master_hrs.index)
为清楚起见,这是我对final_df
的输出:
Temperature Value ... high_load_hrs evening_shoulder_hrs
2019-01-01 0.417517 12.154527 ... NaN NaN
2019-01-02 0.521131 13.811842 ... NaN NaN
2019-01-03 0.583205 12.568966 ... NaN NaN
2019-01-04 0.448225 14.036136 ... NaN NaN
2019-01-05 0.542870 10.738192 ... NaN NaN
... ... ... ... ...
2024-09-10 0.470421 13.775528 ... NaN NaN
2024-09-11 0.384672 10.473930 ... NaN NaN
2024-09-12 0.527284 14.000231 ... NaN NaN
2024-09-13 0.555646 11.460867 ... NaN NaN
2024-09-14 0.426003 3.763975 ... NaN NaN
[2084 rows x 7 columns]
希望这能满足您的需求。