我有一个如下所述的数据框:
Time Price
09:15:18 27,725
09:15:49 27,721
09:16:19 27,696
09:16:32 27,699
09:17:49 27,728
09:18:19 27,742
09:19:19 27,834
09:20:19 27,890
09:20:49 27,890
09:21:49 27,936
09:22:19 27,910
09:23:19 27,921
09:23:49 27,924
09:24:19 27,927
...
说Start_Time = 09:15:00(固定)&Sum_Interval = 10分钟
我想每10分钟找到Price
的总和。
Row1 = Like Sum of Price from 9:15:00 to 9:24:59
Row1 = Like Sum of Price from 9:25:00 to 9:34:59
Row1 = Like Sum of Price from 9:35:00 to 9:44:59
...
我想要的采样结果如下:
结果:
Time Price
09:15 389453
09:25 418261
09:35 568241
...
答案 0 :(得分:1)
使用:
#if necessary, convert to numeric
df['Price'] = df['Price'].str.replace(',','').astype(int)
#convert column to timedeltas
df['Time'] = pd.to_timedelta(df['Time'].astype(str))
Start_Time = '09:15:00'
Sum_Interval = '10Min'
#create timedelta range with maximum timedelta
r = pd.timedelta_range(pd.Timedelta(Start_Time), df['Time'].max(), freq=Sum_Interval)
#create bins by pd.cut, aggregate sum
df = df.groupby(pd.cut(df['Time'], bins=r, labels=r[:-1]))['Price'].sum().reset_index()
print (df)
Time Price
0 09:15:00 389543
如果在输出的字符串中需要Time
值:
r = pd.timedelta_range(pd.Timedelta(Start_Time), df['Time'].max(), freq=Sum_Interval)
lab = r[:-1].astype(str).str[:-3]
df = (df.groupby(pd.cut(df['Time'],bins=r,labels=lab))['Price']
.sum()
.reset_index(name='Price_Sum'))
print (df)
Time Price_Sum
0 09:15 389543
答案 1 :(得分:0)
您可以使用重采样
df['Time'] = pd.to_timedelta(df['Time'])
df['Price'] = df['Price'].str.replace(',','').astype(int)
df = df.set_index('Time')
df.resample('10Min').sum()