Question

我正在尝试创建一个有两列的pandas数据框，

id  time
1   2017/06/07 11:30:22
2   2017/06/07 12:03:32
...  ...
800 2017/07/07 02:10:28

即总数没有。 id的内容应为800，

它应该在过去30小时之间按随机日期（格式相同）填充时间列。

我试过了：

import pandas as pd
N = 800
df = pd.DataFrame({ 'id' : range(1, N + 1 ,1)})

这会给我id列的连续800值，我很困惑如何有时间列并用随机时间填充过去30小时（从当前时间开始）

Answer 1

您可以date_range创建DatetimeIndex，然后使用numpy.random.choice：

last30 = pd.datetime.now().replace(microsecond=0) - pd.Timedelta('30H')
print (last30)
2017-06-08 03:28:58

dates = pd.date_range(last30, periods = 30 * 60 * 60, freq='S')
print (dates)
DatetimeIndex(['2017-06-08 03:28:58', '2017-06-08 03:28:59',
               '2017-06-08 03:29:00', '2017-06-08 03:29:01',
               '2017-06-08 03:29:02', '2017-06-08 03:29:03',
               '2017-06-08 03:29:04', '2017-06-08 03:29:05',
               '2017-06-08 03:29:06', '2017-06-08 03:29:07',
               ...
               '2017-06-09 09:28:48', '2017-06-09 09:28:49',
               '2017-06-09 09:28:50', '2017-06-09 09:28:51',
               '2017-06-09 09:28:52', '2017-06-09 09:28:53',
               '2017-06-09 09:28:54', '2017-06-09 09:28:55',
               '2017-06-09 09:28:56', '2017-06-09 09:28:57'],
              dtype='datetime64[ns]', length=108000, freq='S')

N = 30
df = pd.DataFrame({ 'id' : range(1, N + 1, 1),
                   'time': np.random.choice(dates, size=N)})
print (df)
    id                time
0    1 2017-06-09 03:08:09
1    2 2017-06-08 21:30:41
2    3 2017-06-08 06:45:23
3    4 2017-06-08 05:30:37
4    5 2017-06-09 05:59:04
5    6 2017-06-08 16:08:51
6    7 2017-06-08 13:34:37
7    8 2017-06-09 02:51:59
8    9 2017-06-08 23:34:46
9   10 2017-06-08 18:01:22
10  11 2017-06-09 06:40:02
11  12 2017-06-08 07:58:49
12  13 2017-06-08 17:34:46
13  14 2017-06-08 20:07:05
14  15 2017-06-08 18:04:57
15  16 2017-06-08 07:28:35
16  17 2017-06-08 05:36:14
17  18 2017-06-08 08:05:19
18  19 2017-06-08 11:59:51
19  20 2017-06-08 11:53:28
20  21 2017-06-08 05:48:51
21  22 2017-06-08 11:06:42
22  23 2017-06-08 14:42:22
23  24 2017-06-08 03:31:41
24  25 2017-06-08 07:21:26
25  26 2017-06-08 22:08:06
26  27 2017-06-09 03:46:35
27  28 2017-06-08 16:24:45
28  29 2017-06-08 22:29:15
29  30 2017-06-09 03:22:23

Answer 2

快速而肮脏的解决方案，Numpy Datetime abd Pandas for Ordered Dates随机间隔：

    import numpy as np 

    import pandas as pd 

    import datetime 


    delta = datetime.timedelta(hours = 30) # Create the timedelta 
    from which you want to draw

    times = np.random.randint(0,delta.total_seconds(), size=800) # 
    numpy array from the delta in seconds

    seconds = []


    #Transforming the integers to datetime.delta instances

    for t in np.sort(times): 

        d = datetime.timedelta(seconds=int(t))
        seconds.append(d)


    # Create the start 

    #start = datetime.datetime(2017,6,9,0,0) 

    # If you want a specific date 

    start = datetime.datetime.today() # If you  want exactly now 

    frame = []


    # Build a list with the dates 

    for s in seconds:

        frame.append(start+s)


    #Transform that list to a dataframe

    df = pd.DataFrame()

    df['Date'] = frame`

获取两个输入时间pandas python之间随机时间的数据帧

2 个答案: