我想通过以下示例选择数据集的右侧部分作为解释:
输入df:
id_B, ts_B,value
id1,2017-04-27 01:35:30,0
id1,2017-04-27 01:35:40,0
id1,2017-04-27 01:35:50,1
id1,2017-04-27 01:36:00,4
id1,2017-04-27 01:36:10,5
id1,2017-04-27 01:36:20,100
id1,2017-04-27 01:36:30,155
id1,2017-04-27 01:36:40,235
id1,2017-04-27 01:36:50,0
id1,2017-04-27 01:36:60,0
id1,2017-04-27 01:37:00,2353
id1,2017-04-27 01:37:10,221
id1,2017-04-27 01:37:20,2432
id1,2017-04-27 01:37:30,2654
id1,2017-04-27 01:37:40,12
id1,2017-04-27 01:37:50,5
id1,2017-04-27 01:38:00,5
id1,2017-04-27 01:38:10,23
id1,2017-04-27 01:38:20,5
id1,2017-04-27 01:38:30,2
id1,2017-04-27 01:38:40,2
id1,2017-04-27 01:38:50,1
id1,2017-04-27 01:39:00,0
id1,2017-04-27 01:39:10,0
id1,2017-04-27 01:39:20,0
id1,2017-04-27 01:39:30,0
id1,2017-04-27 01:39:40,0
id1,2017-04-27 01:39:50,0
id1,2017-04-27 01:40:00,0
id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1
考虑以下因素:
segment_number = 1
持续时间= 3分钟
我想从第一个df.value非零开始选择数据帧的第一个段,直到覆盖3分钟持续时间的最后一个值。
输出:
id1,2017-04-27 01:35:50,1
id1,2017-04-27 01:36:00,4
id1,2017-04-27 01:36:10,5
id1,2017-04-27 01:36:20,100
id1,2017-04-27 01:36:30,155
id1,2017-04-27 01:36:40,235
id1,2017-04-27 01:36:50,0
id1,2017-04-27 01:36:60,0
id1,2017-04-27 01:37:00,2353
id1,2017-04-27 01:37:10,221
id1,2017-04-27 01:37:20,2432
id1,2017-04-27 01:37:30,2654
id1,2017-04-27 01:37:40,12
id1,2017-04-27 01:37:50,5
id1,2017-04-27 01:38:00,5
id1,2017-04-27 01:38:10,23
id1,2017-04-27 01:38:20,5
id1,2017-04-27 01:38:30,2
id1,2017-04-27 01:38:40,2
id1,2017-04-27 01:38:50,1
考虑以下因素:
segment_number = 2
持续时间= 1.40分钟
我想选择日期帧的第二段,从第一个df.value非零开始,直到覆盖1.40分钟持续时间的最后一个值。
输出:
id1,2017-04-27 01:40:10,1
id1,2017-04-27 01:40:20,5
id1,2017-04-27 01:40:30,221
id1,2017-04-27 01:40:40,2432
id1,2017-04-27 01:40:50,2654
id1,2017-04-27 01:40:60,12
id1,2017-04-27 01:41:00,5
id1,2017-04-27 01:41:10,5
id1,2017-04-27 01:41:20,23
id1,2017-04-27 01:41:30,5
id1,2017-04-27 01:41:40,2
id1,2017-04-27 01:41:50,1
到目前为止,我使用`pd.to_datetime和set_index'将df w.r.t索引到ts_B。并使用变量" last_end_point"跟踪前一段的索引 但我没有得到正确的输出。
任何帮助都将不胜感激。
答案 0 :(得分:0)
这是我制定的答案:
import pandas as pd
import numpy as np
import datetime
df = pd.read_csv("filename.csv")
df['ts_B'] = pd.to_datetime(df['ts_B'])
def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold):
non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index
first_index = non_zero_indexs[0] if len(non_zero_indexs)>0 else None
if(not first_index):
return {"sub_df": None,
"start_index": None,
"end_index":None,
"duration": duration}
start_time = energenie_df.loc[first_index].ts_B
hours,minutes,seconds = duration.split(":")
end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds))
last_index = energenie_df[energenie_df["ts_B"]>end_time].index[0]-1
return {"sub_df": energenie_df.loc[first_index:last_index],
"start_index": first_index,
"end_index":last_index,
"duration": duration}
out = find_the_energenies_segment("id1", "00:03:00", df, 0 )
print(out)