python最近的时间戳数据识别和分组

时间:2019-07-19 17:35:55

标签: python dataframe

我每30分钟就有一个包含数据的数据框。

combined_df =
         datetime         data1 data2 
    2019-01-01 08:00:00     10     20 
    2019-01-01 08:01:00     30     40
   .
   . 
    2019-01-01 08:30:00     100     200
    2019-01-01 08:31:00     300    400
    .
    .

现在,我想通过紧密匹配的时间戳对数据进行分组。在上述情况下,我想获得以下输出:

session_df = 
         datetime         data1 data2   data1   data2 
    2019-01-01 08:00:00     10     20    30       40   .  . 
    2019-01-01 08:30:00    100     200   300      400  . . 
    .
    .

如何实现?

1 个答案:

答案 0 :(得分:0)

这个问题尚不清楚,我建议重新措词,但我认为您正在尝试以30分钟的块查看combined_df,然后以交替的方式将data1和data2中的所有值合并为一行,并为其分配此值使用每个块的开始时间作为datetime值,每30分钟将一个新的df行换为session_df

这可能对您有用,我在一些类似于您的虚拟数据上对其进行了测试

df['datetime']=pd.to_datetime(df['datetime']) #convert to datetime, not necessary if already in proper format
df.set_index(['datetime'],inplace=True) #not necessary but I like to keep my dates in the index

final_datetimes=df.index[(df.index.minute==0) | (df.index.minute==30)] #get all datetimes in 30min intervals, starting and 0 and ending at 30
num_cols=2*len(df[(df.index >= final_datetimes[0]) & (df.index < final_datetimes[1])]) #number of columns needed for new df
col_names= ['data' + str(num) for num in range(num_cols)] #generate list of names for them (can't have duplicate column names in df)

df2=pd.DataFrame(index=final_datetimes,columns=col_names) #new df with the datetime intervals and correct number of columns

for row in df2.iterrows(): #iterate through each row
    iloc = df2.index.get_loc(row[0])  # get index location (row[0] is the index value of that row)
    data1_list = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])]['data1'].values.tolist() #get all data1 values in this range
    data2_list = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])]['data2'].values.tolist() #get all data2 values in this range
    final_list = [None]*len(data1_list+data2_list) #create empty list of correct size to store all data1 and data2 values
    final_list[::2],final_list[1::2] = data1_list,data2_list #populate list with data1 and data2 values in alternating order
    df2.iloc[iloc]=final_list #assign list to all columns in row 

注意-您将必须添加一个if语句(或稍加修改代码)以处理最后一步,因为没有边界的下一个日期时间可以合并来自的数据