按时间列拆分数据框 - pandas

时间:2017-08-14 15:14:28

标签: python pandas

我想通过以下示例选择数据集的右侧部分作为解释:

输入df:

id_B, ts_B,value
id1,2017-04-27 01:35:30,0
id1,2017-04-27 01:35:40,0
id1,2017-04-27 01:35:50,1
id1,2017-04-27 01:36:00,4
id1,2017-04-27 01:36:10,5
id1,2017-04-27 01:36:20,100
id1,2017-04-27 01:36:30,155
id1,2017-04-27 01:36:40,235
id1,2017-04-27 01:36:50,0
id1,2017-04-27 01:36:60,0
id1,2017-04-27 01:37:00,2353
id1,2017-04-27 01:37:10,221
id1,2017-04-27 01:37:20,2432
id1,2017-04-27 01:37:30,2654
id1,2017-04-27 01:37:40,12
id1,2017-04-27 01:37:50,5
id1,2017-04-27 01:38:00,5
id1,2017-04-27 01:38:10,23
id1,2017-04-27 01:38:20,5
id1,2017-04-27 01:38:30,2
id1,2017-04-27 01:38:40,2
id1,2017-04-27 01:38:50,1
id1,2017-04-27 01:39:00,0
id1,2017-04-27 01:39:10,0
id1,2017-04-27 01:39:20,0
id1,2017-04-27 01:39:30,0
id1,2017-04-27 01:39:40,0
id1,2017-04-27 01:39:50,0
id1,2017-04-27 01:40:00,0
id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1

考虑以下因素:   segment_number = 1
  持续时间= 3分钟

我想从第一个df.value非零开始选择数据帧的第一个段,直到覆盖3分钟持续时间的最后一个值。

输出:      id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1

考虑以下因素:   segment_number = 2
  持续时间= 1.40分钟

我想选择日期帧的第二段,从第一个df.value非零开始,直到覆盖1.40分钟持续时间的最后一个值。

输出:

id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1

到目前为止,我使用`pd.to_datetime和set_index'将df w.r.t索引到ts_B。并使用变量" last_end_point"跟踪前一段的索引 但我没有得到正确的输出。

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:0)

这是我制定的答案:

import pandas as pd
import numpy as np
import datetime

df = pd.read_csv("filename.csv")
df['ts_B'] = pd.to_datetime(df['ts_B'])  

def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold):
    non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index 

    first_index = non_zero_indexs[0]  if len(non_zero_indexs)>0 else None


    if(not first_index):
       return {"sub_df": None,
           "start_index": None,
           "end_index":None,
           "duration": duration}

    start_time = energenie_df.loc[first_index].ts_B 
    hours,minutes,seconds = duration.split(":")
    end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds))


    last_index = energenie_df[energenie_df["ts_B"]>end_time].index[0]-1 

    return {"sub_df": energenie_df.loc[first_index:last_index],
       "start_index": first_index,
       "end_index":last_index,
       "duration": duration}


out = find_the_energenies_segment("id1", "00:03:00", df, 0 )
print(out)