我每30分钟就有一个包含数据的数据框。
combined_df =
datetime data1 data2
2019-01-01 08:00:00 10 20
2019-01-01 08:01:00 30 40
.
.
2019-01-01 08:30:00 100 200
2019-01-01 08:31:00 300 400
.
.
现在,我想通过紧密匹配的时间戳对数据进行分组。在上述情况下,我想获得以下输出:
session_df =
datetime data1 data2 data1 data2
2019-01-01 08:00:00 10 20 30 40 . .
2019-01-01 08:30:00 100 200 300 400 . .
.
.
如何实现?
答案 0 :(得分:0)
这个问题尚不清楚,我建议重新措词,但我认为您正在尝试以30分钟的块查看combined_df,然后以交替的方式将data1和data2中的所有值合并为一行,并为其分配此值使用每个块的开始时间作为datetime值,每30分钟将一个新的df行换为session_df
这可能对您有用,我在一些类似于您的虚拟数据上对其进行了测试
df['datetime']=pd.to_datetime(df['datetime']) #convert to datetime, not necessary if already in proper format
df.set_index(['datetime'],inplace=True) #not necessary but I like to keep my dates in the index
final_datetimes=df.index[(df.index.minute==0) | (df.index.minute==30)] #get all datetimes in 30min intervals, starting and 0 and ending at 30
num_cols=2*len(df[(df.index >= final_datetimes[0]) & (df.index < final_datetimes[1])]) #number of columns needed for new df
col_names= ['data' + str(num) for num in range(num_cols)] #generate list of names for them (can't have duplicate column names in df)
df2=pd.DataFrame(index=final_datetimes,columns=col_names) #new df with the datetime intervals and correct number of columns
for row in df2.iterrows(): #iterate through each row
iloc = df2.index.get_loc(row[0]) # get index location (row[0] is the index value of that row)
data1_list = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])]['data1'].values.tolist() #get all data1 values in this range
data2_list = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])]['data2'].values.tolist() #get all data2 values in this range
final_list = [None]*len(data1_list+data2_list) #create empty list of correct size to store all data1 and data2 values
final_list[::2],final_list[1::2] = data1_list,data2_list #populate list with data1 and data2 values in alternating order
df2.iloc[iloc]=final_list #assign list to all columns in row
注意-您将必须添加一个if语句(或稍加修改代码)以处理最后一步,因为没有边界的下一个日期时间可以合并来自的数据