Question

我想通过以下示例选择数据集的右侧部分作为解释：

输入df：

id_B, ts_B,value
id1,2017-04-27 01:35:30,0
id1,2017-04-27 01:35:40,0
id1,2017-04-27 01:35:50,1
id1,2017-04-27 01:36:00,4
id1,2017-04-27 01:36:10,5
id1,2017-04-27 01:36:20,100
id1,2017-04-27 01:36:30,155
id1,2017-04-27 01:36:40,235
id1,2017-04-27 01:36:50,0
id1,2017-04-27 01:36:60,0
id1,2017-04-27 01:37:00,2353
id1,2017-04-27 01:37:10,221
id1,2017-04-27 01:37:20,2432
id1,2017-04-27 01:37:30,2654
id1,2017-04-27 01:37:40,12
id1,2017-04-27 01:37:50,5
id1,2017-04-27 01:38:00,5
id1,2017-04-27 01:38:10,23
id1,2017-04-27 01:38:20,5
id1,2017-04-27 01:38:30,2
id1,2017-04-27 01:38:40,2
id1,2017-04-27 01:38:50,1
id1,2017-04-27 01:39:00,0
id1,2017-04-27 01:39:10,0
id1,2017-04-27 01:39:20,0
id1,2017-04-27 01:39:30,0
id1,2017-04-27 01:39:40,0
id1,2017-04-27 01:39:50,0
id1,2017-04-27 01:40:00,0
id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1

考虑以下因素： segment_number = 1
持续时间= 3分钟

我想从第一个df.value非零开始选择数据帧的第一个段，直到覆盖3分钟持续时间的最后一个值。

输出： id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1

考虑以下因素： segment_number = 2
持续时间= 1.40分钟

我想选择日期帧的第二段，从第一个df.value非零开始，直到覆盖1.40分钟持续时间的最后一个值。

输出：

id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1

到目前为止，我使用`pd.to_datetime和set_index＆＃39;将df w.r.t索引到ts_B。并使用变量＆＃34; last_end_point＆＃34;跟踪前一段的索引但我没有得到正确的输出。

任何帮助都将不胜感激。

Answer 1

这是我制定的答案：

import pandas as pd
import numpy as np
import datetime

df = pd.read_csv("filename.csv")
df['ts_B'] = pd.to_datetime(df['ts_B'])  

def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold):
    non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index 

    first_index = non_zero_indexs[0]  if len(non_zero_indexs)>0 else None


    if(not first_index):
       return {"sub_df": None,
           "start_index": None,
           "end_index":None,
           "duration": duration}

    start_time = energenie_df.loc[first_index].ts_B 
    hours,minutes,seconds = duration.split(":")
    end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds))


    last_index = energenie_df[energenie_df["ts_B"]>end_time].index[0]-1 

    return {"sub_df": energenie_df.loc[first_index:last_index],
       "start_index": first_index,
       "end_index":last_index,
       "duration": duration}


out = find_the_energenies_segment("id1", "00:03:00", df, 0 )
print(out)

按时间列拆分数据框 - pandas

1 个答案: