我正在尝试创建一个有两列的pandas数据框,
id time
1 2017/06/07 11:30:22
2 2017/06/07 12:03:32
... ...
800 2017/07/07 02:10:28
即总数没有。 id的内容应为800,
它应该在过去30小时之间按随机日期(格式相同)填充时间列。
我试过了:
import pandas as pd
N = 800
df = pd.DataFrame({ 'id' : range(1, N + 1 ,1)})
这会给我id列的连续800值,我很困惑如何有时间列并用随机时间填充过去30小时(从当前时间开始)
答案 0 :(得分:2)
您可以date_range
创建DatetimeIndex
,然后使用numpy.random.choice
:
last30 = pd.datetime.now().replace(microsecond=0) - pd.Timedelta('30H')
print (last30)
2017-06-08 03:28:58
dates = pd.date_range(last30, periods = 30 * 60 * 60, freq='S')
print (dates)
DatetimeIndex(['2017-06-08 03:28:58', '2017-06-08 03:28:59',
'2017-06-08 03:29:00', '2017-06-08 03:29:01',
'2017-06-08 03:29:02', '2017-06-08 03:29:03',
'2017-06-08 03:29:04', '2017-06-08 03:29:05',
'2017-06-08 03:29:06', '2017-06-08 03:29:07',
...
'2017-06-09 09:28:48', '2017-06-09 09:28:49',
'2017-06-09 09:28:50', '2017-06-09 09:28:51',
'2017-06-09 09:28:52', '2017-06-09 09:28:53',
'2017-06-09 09:28:54', '2017-06-09 09:28:55',
'2017-06-09 09:28:56', '2017-06-09 09:28:57'],
dtype='datetime64[ns]', length=108000, freq='S')
N = 30
df = pd.DataFrame({ 'id' : range(1, N + 1, 1),
'time': np.random.choice(dates, size=N)})
print (df)
id time
0 1 2017-06-09 03:08:09
1 2 2017-06-08 21:30:41
2 3 2017-06-08 06:45:23
3 4 2017-06-08 05:30:37
4 5 2017-06-09 05:59:04
5 6 2017-06-08 16:08:51
6 7 2017-06-08 13:34:37
7 8 2017-06-09 02:51:59
8 9 2017-06-08 23:34:46
9 10 2017-06-08 18:01:22
10 11 2017-06-09 06:40:02
11 12 2017-06-08 07:58:49
12 13 2017-06-08 17:34:46
13 14 2017-06-08 20:07:05
14 15 2017-06-08 18:04:57
15 16 2017-06-08 07:28:35
16 17 2017-06-08 05:36:14
17 18 2017-06-08 08:05:19
18 19 2017-06-08 11:59:51
19 20 2017-06-08 11:53:28
20 21 2017-06-08 05:48:51
21 22 2017-06-08 11:06:42
22 23 2017-06-08 14:42:22
23 24 2017-06-08 03:31:41
24 25 2017-06-08 07:21:26
25 26 2017-06-08 22:08:06
26 27 2017-06-09 03:46:35
27 28 2017-06-08 16:24:45
28 29 2017-06-08 22:29:15
29 30 2017-06-09 03:22:23
答案 1 :(得分:0)
快速而肮脏的解决方案,Numpy Datetime abd Pandas for Ordered Dates随机间隔:
import numpy as np
import pandas as pd
import datetime
delta = datetime.timedelta(hours = 30) # Create the timedelta
from which you want to draw
times = np.random.randint(0,delta.total_seconds(), size=800) #
numpy array from the delta in seconds
seconds = []
#Transforming the integers to datetime.delta instances
for t in np.sort(times):
d = datetime.timedelta(seconds=int(t))
seconds.append(d)
# Create the start
#start = datetime.datetime(2017,6,9,0,0)
# If you want a specific date
start = datetime.datetime.today() # If you want exactly now
frame = []
# Build a list with the dates
for s in seconds:
frame.append(start+s)
#Transform that list to a dataframe
df = pd.DataFrame()
df['Date'] = frame`