Python Pandas-非连续时间序列?

时间:2018-12-15 08:34:53

标签: python python-3.x pandas time-series

Pandas中是否有一种方法可以创建一个时间序列,该时间序列由一个周期中每一天的选定时钟时间组成?例如:让我们以2018年的全天为日期,以时间(时钟)为4个不同的时间,例如[09:00, 10:35, 14:00, 15:50]

所以我想要的是一个像这样的时间序列:

2018-01-01 09:00
2018-01-01 10:35
2018-01-01 14:00
2018-01-01 15:50
2018-01-02 09:00
2018-01-02 10:35
2018-01-02 14:00
2018-01-02 15:50
2018-01-03 09:00
...

TIA, T

3 个答案:

答案 0 :(得分:3)

您可以执行以下操作,该操作比使用循环要快得多:

1。)创建日期列表:

In [34]: start_dt = '2018-01-01'
# For whole year, use periods=365
In [45]: days_list = pd.date_range(pd.to_datetime(start_dt), periods=3) 
In [59]: days_list = [i.date() for i in days_list] # Keeping only date part

2。)创建时间列表:

In [38]: timelist = ['09:00', '10:35', '14:00', '15:50']

3。)通过重复days_list中的每个元素4次来扩展列表,每次重复一次:

In [60]: import numpy as np
In [61]: days_list = np.repeat(days_list, 4)

4。)通过将时间表与days_list中的唯一日期相乘来扩展时间表,使其长度与days_list相同:

因此,由于我们在创建days_list时使用了periods=3。因此,将时间表扩展相同的原因;

In [64]: timelist = timelist * 3

5。)创建数据框:

In [65]: df = pd.DataFrame()
In [66]: df['Date'] = days_list
In [68]: df['time'] = timelist

最终输出:

In [78]: df
Out[78]: 
          Date   time
0   2018-01-01  09:00
1   2018-01-01  10:35
2   2018-01-01  14:00
3   2018-01-01  15:50
4   2018-01-02  09:00
5   2018-01-02  10:35
6   2018-01-02  14:00
7   2018-01-02  15:50
8   2018-01-03  09:00
9   2018-01-03  10:35
10  2018-01-03  14:00
11  2018-01-03  15:50

答案 1 :(得分:1)

尝试将熊猫date_range()和熊猫where()组合使用。

import pandas as pd
import numpy as np
import datetime

# Define times
times = ['00:00','09:00', '10:35', '14:00', '15:50']

# Define dates
start_date = '01.01.2018'
end_date = '31.12.2018'

# Create a list in minute resolution between start and end date
diff = pd.date_range(start_date,end_date,freq='T')

# Keep only the elements which match with the defined times in the list
filtered_hours = diff.where([mins in times for mins in diff.strftime('%H:%M')]).dropna()

print(filtered_hours)


DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 09:00:00',
               '2018-01-01 10:35:00', '2018-01-01 14:00:00',
               '2018-01-01 15:50:00', '2018-01-02 00:00:00',
               '2018-01-02 09:00:00', '2018-01-02 10:35:00',
               '2018-01-02 14:00:00', '2018-01-02 15:50:00',
               ...
               '2018-12-29 09:00:00', '2018-12-29 10:35:00',
               '2018-12-29 14:00:00', '2018-12-29 15:50:00',
               '2018-12-30 00:00:00', '2018-12-30 09:00:00',
               '2018-12-30 10:35:00', '2018-12-30 14:00:00',
               '2018-12-30 15:50:00', '2018-12-31 00:00:00'],
              dtype='datetime64[ns]', length=1821, freq='T')

将其放入系列

# Put it into a Series

val = np.arange(0,len(filtered_hours))

Ser = pd.Series(val,index=filtered_hours)

print(Ser)



2018-01-01 00:00:00       0
2018-01-01 09:00:00       1
2018-01-01 10:35:00       2
2018-01-01 14:00:00       3
2018-01-01 15:50:00       4
2018-01-02 00:00:00       5
2018-01-02 09:00:00       6
2018-01-02 10:35:00       7
2018-01-02 14:00:00       8
2018-01-02 15:50:00       9
2018-01-03 00:00:00      10
2018-01-03 09:00:00      11
2018-01-03 10:35:00      12
2018-01-03 14:00:00      13
2018-01-03 15:50:00      14
2018-01-04 00:00:00      15
2018-01-04 09:00:00      16
2018-01-04 10:35:00      17

答案 2 :(得分:0)

timelist = ['09:00', '10:35', '14:00', '15:50']
dayslist = []
for day in range(1, 366):
    for time in timelist:
        dayslist.append(str(day) + ' ' + time)

print(dayslist)

您应该能够导入时间并遍历时间,以所需的日期格式替换日期 然后,您可以将列表输入到Pandas DataFrame中。

import pandas as pd
df = pd.DataFrame({‘DateTime’: dayslist})
print(df)