Question

我有一个大的数据框，其中有完整的日期时间作为索引，每分钟有2列带温度的列（抱歉，我不知道如何编写带有时间索引的数据框的代码）：

df = pd.DataFrame(np.array([[210, 211], [212, 215], [212, 215], [214, 214]]),
                columns=['t1', 't2'])
                        t1   t2   
2015-01-01 00:00:00     210  211       
2015-01-01 00:01:00     212  215       
2015-01-01 00:02:00     212  215
... 
2015-01-01 01:05:00     240  232
2015-01-01 01:06:00     206  209

我必须创建两个新列t1_mean和t2_mean包含

t1_mean-从小时机智开始的前30分钟从6分钟开始的平均值（例如，从2015-01-01 00:06:00到2015-01-01 00:35:00）
t2_mean-表示从每小时机智开始的最后30分钟（从6分钟开始）的平均值（例如，从2015-01-01 00:36:00到2015-01-01 01:05:00）并且此值必须在一个小时的最后一行中，从6分钟开始（例如2015-01-01 01:05:00）

它应该看起来像这样：

                         t1   t2  t1_mean t2_mean
2015-01-01 00:00:00     210  211   NaN      NaN
2015-01-01 00:01:00     212  215   NaN      NaN
2015-01-01 00:02:00     212  215   NaN      NaN
... 
2015-01-01 01:05:00      240  232   220      228
2015-01-01 01:06:00      206  209   Nan      NaN
... 
2015-01-01 02:05:00      245  234   221      235
...

如何解决此任务？

提前感谢您的回复

Answer 1

好吧，这段代码假设您有一个数据帧df，其日期时间索引为datatime_col，并且有两列t1和t2：

mean_1 = {}
mean_2 = {}

for i in range(0,24):
    # If you have performance issues, you can enhance this conditions with numpy arrays
    j = i+1
    if (i < 10):
        i = '0'+str(i)
    if (j < 10):
        j = '0'+str(j)
    if (j == 24):
        j = '00'
    
    row_first = df.between_time(f'{i}:06:00',f'{i}:35:00').reset_index().resample('D', on='datetime_col').mean().reset_index()
    row_last = df.between_time(f'{i}:36:00',f'{j}:05:00').reset_index().resample('D', on='datetime_col').mean().reset_index()
    
    #This just confirm that you have rows in those times
    if len(row_first) != 0 and len(row_last) != 0:
        # By default, pandas mean return a float with lot of decimal values, 
        # Then, you can apply round() or int
        if j == '00':
            mean_1[str((row_first.datetime_col[0].date() + pd.DateOffset(1)).date()) +  f' {j}:05:00'] = [row_first.t1[0]] # [round(row_first.t1[0],1)]
            mean_2[str((row_last.datetime_col[0].date() + pd.DateOffset(1)).date()) +  f' {j}:05:00'] = [row_last.t2[0]] # [round(row_first.t2[0],1)]
        else:
            mean_1[str(row_first.datetime_col[0].date()) +  f' {j}:05:00'] = [row_first.t1[0]]  # [round(row_first.t1[0],1)]
            mean_2[str(row_last.datetime_col[0].date()) +  f' {j}:05:00'] = [row_last.t2[0]]   # [round(row_first.t2[0],1)]
            

df_mean1 = pd.DataFrame.from_dict(mean_1, orient='index', columns=['mean_1']).reset_index().rename(columns={'index':'datetime_col'})
df_mean2 = pd.DataFrame.from_dict(mean_2, orient='index', columns=['mean_2']).reset_index().rename(columns={'index':'datetime_col'})

df_mean1['datetime_col'] = pd.to_datetime(df_mean1['datetime_col'])
df_mean2['datetime_col'] = pd.to_datetime(df_mean2['datetime_col'])

df = df.merge(df_mean1, on = 'datetime_col', how='left')
df = df.merge(df_mean2, on = 'datetime_col', how='left')

Answer 2

处理流程：。

添加日期中的分钟和小时数据。
将时间列移动6行
添加聚合标志。
计算平均值。
与原始DF合并。 ps。平均值可以是四，因此将有四列。

df1 = df.copy()
df1['minute'] = df.index.minute
df1['hour'] = df.index.strftime('%Y-%m-%d %H:05:00')
df1['hour'] = df1['hour'].shift(6)
df1['flg'] = df1['minute'].apply(lambda x: 0 if 6 <= x <= 35 else 1 )
df1 = df1.groupby(['hour','flg'])[['t1','t2']].mean()
df1 = df1.unstack(level=1)
df1.columns = [f'{a}_{b}' for a,b in df1.columns]
df1.reset_index(col_level=1,inplace=True)
df1['hour'] = pd.to_datetime(df1['hour'])
df.reset_index(inplace=True)
new_df = df.merge(df1, left_on=df['index'], right_on=df1['hour'], how='outer')
new_df.drop(['key_0','hour'], inplace=True ,axis=1)
new_df.head(10)
    index   t1  t2  t1_0    t1_1    t2_0    t2_1
0   2015-01-01 00:00:00 220 212 NaN NaN NaN NaN
1   2015-01-01 00:01:00 244 223 NaN NaN NaN NaN
2   2015-01-01 00:02:00 246 241 NaN NaN NaN NaN
3   2015-01-01 00:03:00 242 241 NaN NaN NaN NaN
4   2015-01-01 00:04:00 233 247 NaN NaN NaN NaN
5   2015-01-01 00:05:00 239 208 222.9   224.4   227.733333  223.266667
6   2015-01-01 00:06:00 212 249 NaN NaN NaN NaN
7   2015-01-01 00:07:00 201 237 NaN NaN NaN NaN
8   2015-01-01 00:08:00 238 217 NaN NaN NaN NaN
9   2015-01-01 00:09:00 218 244 NaN NaN NaN NaN

熊猫-使用另一列的平均值作为新列

2 个答案: