我有一个大的数据框,其中有完整的日期时间作为索引,每分钟有2列带温度的列(抱歉,我不知道如何编写带有时间索引的数据框的代码):
df = pd.DataFrame(np.array([[210, 211], [212, 215], [212, 215], [214, 214]]),
columns=['t1', 't2'])
t1 t2
2015-01-01 00:00:00 210 211
2015-01-01 00:01:00 212 215
2015-01-01 00:02:00 212 215
...
2015-01-01 01:05:00 240 232
2015-01-01 01:06:00 206 209
我必须创建两个新列t1_mean和t2_mean包含
它应该看起来像这样:
t1 t2 t1_mean t2_mean
2015-01-01 00:00:00 210 211 NaN NaN
2015-01-01 00:01:00 212 215 NaN NaN
2015-01-01 00:02:00 212 215 NaN NaN
...
2015-01-01 01:05:00 240 232 220 228
2015-01-01 01:06:00 206 209 Nan NaN
...
2015-01-01 02:05:00 245 234 221 235
...
如何解决此任务?
提前感谢您的回复
答案 0 :(得分:1)
好吧,这段代码假设您有一个数据帧df
,其日期时间索引为datatime_col
,并且有两列t1
和t2
:
mean_1 = {}
mean_2 = {}
for i in range(0,24):
# If you have performance issues, you can enhance this conditions with numpy arrays
j = i+1
if (i < 10):
i = '0'+str(i)
if (j < 10):
j = '0'+str(j)
if (j == 24):
j = '00'
row_first = df.between_time(f'{i}:06:00',f'{i}:35:00').reset_index().resample('D', on='datetime_col').mean().reset_index()
row_last = df.between_time(f'{i}:36:00',f'{j}:05:00').reset_index().resample('D', on='datetime_col').mean().reset_index()
#This just confirm that you have rows in those times
if len(row_first) != 0 and len(row_last) != 0:
# By default, pandas mean return a float with lot of decimal values,
# Then, you can apply round() or int
if j == '00':
mean_1[str((row_first.datetime_col[0].date() + pd.DateOffset(1)).date()) + f' {j}:05:00'] = [row_first.t1[0]] # [round(row_first.t1[0],1)]
mean_2[str((row_last.datetime_col[0].date() + pd.DateOffset(1)).date()) + f' {j}:05:00'] = [row_last.t2[0]] # [round(row_first.t2[0],1)]
else:
mean_1[str(row_first.datetime_col[0].date()) + f' {j}:05:00'] = [row_first.t1[0]] # [round(row_first.t1[0],1)]
mean_2[str(row_last.datetime_col[0].date()) + f' {j}:05:00'] = [row_last.t2[0]] # [round(row_first.t2[0],1)]
df_mean1 = pd.DataFrame.from_dict(mean_1, orient='index', columns=['mean_1']).reset_index().rename(columns={'index':'datetime_col'})
df_mean2 = pd.DataFrame.from_dict(mean_2, orient='index', columns=['mean_2']).reset_index().rename(columns={'index':'datetime_col'})
df_mean1['datetime_col'] = pd.to_datetime(df_mean1['datetime_col'])
df_mean2['datetime_col'] = pd.to_datetime(df_mean2['datetime_col'])
df = df.merge(df_mean1, on = 'datetime_col', how='left')
df = df.merge(df_mean2, on = 'datetime_col', how='left')
答案 1 :(得分:1)
处理流程:。
df1 = df.copy()
df1['minute'] = df.index.minute
df1['hour'] = df.index.strftime('%Y-%m-%d %H:05:00')
df1['hour'] = df1['hour'].shift(6)
df1['flg'] = df1['minute'].apply(lambda x: 0 if 6 <= x <= 35 else 1 )
df1 = df1.groupby(['hour','flg'])[['t1','t2']].mean()
df1 = df1.unstack(level=1)
df1.columns = [f'{a}_{b}' for a,b in df1.columns]
df1.reset_index(col_level=1,inplace=True)
df1['hour'] = pd.to_datetime(df1['hour'])
df.reset_index(inplace=True)
new_df = df.merge(df1, left_on=df['index'], right_on=df1['hour'], how='outer')
new_df.drop(['key_0','hour'], inplace=True ,axis=1)
new_df.head(10)
index t1 t2 t1_0 t1_1 t2_0 t2_1
0 2015-01-01 00:00:00 220 212 NaN NaN NaN NaN
1 2015-01-01 00:01:00 244 223 NaN NaN NaN NaN
2 2015-01-01 00:02:00 246 241 NaN NaN NaN NaN
3 2015-01-01 00:03:00 242 241 NaN NaN NaN NaN
4 2015-01-01 00:04:00 233 247 NaN NaN NaN NaN
5 2015-01-01 00:05:00 239 208 222.9 224.4 227.733333 223.266667
6 2015-01-01 00:06:00 212 249 NaN NaN NaN NaN
7 2015-01-01 00:07:00 201 237 NaN NaN NaN NaN
8 2015-01-01 00:08:00 238 217 NaN NaN NaN NaN
9 2015-01-01 00:09:00 218 244 NaN NaN NaN NaN