如何循环在python中设置时间序列数据?

时间:2019-08-24 14:58:44

标签: python-3.x

我是python的新手,我拥有在1年(365天)内记录的时间序列数据,两次测量之间的间隔不一致,有时为3s,4s或7s。以下是截至2011年4月30日的数据示例

find the data here

所以我想要的代码将累加时间,直到总和从开始时间变成一分钟,所以当累加时间时,有时会发现即将完成一分钟的行使总和超过2秒(例如,开始时间= 23:40:40,将要完成一分钟的行是23:41:42),然后忽略该行并平均一分钟内的所有数据(换句话说,我们将得出的数据取平均值)在一分钟或更短的时间内),而下一分钟的下一个开始时间将是被忽略的行。我不确定我是否足够清楚。

1 个答案:

答案 0 :(得分:0)

假设您可以自己遍历行,并且行的排列顺序最少

import datetime as dt
first_time = dt.datetime(Y,M,D, h,m,s) # extractfrom first row
delta = dt.timedelta(minutes=1)
next_time = first_time + delta  # first time that does not go in current minute
sums = {}

def key_format(dt_ob: dt.datetime) -> str:
    return f'{dt_ob.year:04}{dt_ob.month:02}{dt_ob.day:02}{dt_ob.hour:02}{dt_ob.minute:02}'
key = key_format(first_time) # format is YYYYMMDDhhmm; seconds not needed cuz they vary inside a minute

for i in rows: # do it however you can, comment answer with where data is stored if you can't
    curent_time = dt.datetime(...) # extract time from i
    if current_time >= next_time: # determine if time is outside minute
        key = key_format(next_time) # change key to next
        next_time += delta # move time to next minute
    sums[key] = sums.get(key, 0) + seconds_to_add_up_in_this_row
    # sums.get(key, 0) returns sums[key] or 0 if key not in sums