Question

我有一个如下所述的数据框：

Time         Price
09:15:18     27,725 
09:15:49     27,721 
09:16:19     27,696 
09:16:32     27,699 
09:17:49     27,728 
09:18:19     27,742 
09:19:19     27,834 
09:20:19     27,890 
09:20:49     27,890 
09:21:49     27,936 
09:22:19     27,910 
09:23:19     27,921 
09:23:49     27,924 
09:24:19     27,927 
...

说Start_Time = 09:15:00（固定）＆Sum_Interval = 10分钟

我想每10分钟找到Price的总和。

Row1 = Like Sum of Price from 9:15:00 to 9:24:59
Row1 = Like Sum of Price from 9:25:00 to 9:34:59
Row1 = Like Sum of Price from 9:35:00 to 9:44:59
...

我想要的采样结果如下：

结果：

Time    Price
09:15   389453
09:25   418261
09:35   568241
...

Answer 1

使用：

#if necessary, convert to numeric
df['Price'] = df['Price'].str.replace(',','').astype(int)
#convert column to timedeltas
df['Time'] = pd.to_timedelta(df['Time'].astype(str))

Start_Time = '09:15:00'
Sum_Interval = '10Min'

#create timedelta range with maximum timedelta
r = pd.timedelta_range(pd.Timedelta(Start_Time), df['Time'].max(), freq=Sum_Interval)

#create bins by pd.cut, aggregate sum
df = df.groupby(pd.cut(df['Time'], bins=r, labels=r[:-1]))['Price'].sum().reset_index()
print (df)
      Time   Price
0 09:15:00  389543

如果在输出的字符串中需要Time值：

r = pd.timedelta_range(pd.Timedelta(Start_Time), df['Time'].max(), freq=Sum_Interval)
lab = r[:-1].astype(str).str[:-3]

df = (df.groupby(pd.cut(df['Time'],bins=r,labels=lab))['Price']
        .sum()
        .reset_index(name='Price_Sum'))
print (df)
    Time  Price_Sum
0  09:15     389543

Answer 2

您可以使用重采样

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html#pandas.DataFrame.resample

df['Time'] = pd.to_timedelta(df['Time'])
df['Price'] =  df['Price'].str.replace(',','').astype(int) 
df = df.set_index('Time')

df.resample('10Min').sum()

以固定时间间隔计算总和

2 个答案: