我的Postgres数据库以2019-05-22 18:01:38.425533+00
的格式保存日期。对于我的回归模型,我必须使用该日期。因此,我尝试使用df['created'] = pd.to_datetime(df.created)
对其进行转换。我是否选择正确的格式来处理我的数据?如果我绘制数据,它将在此处渲染该图像。数据的值介于0到200之间,似乎不太正确。
# Load data
def load_event_data():
df = pd.read_csv('event_data.csv')
df['created'] = pd.to_datetime(df.created)
return df
event_data = load_event_data()
print("The defined index is", event_data.index.name)
# Visualize data
plt.figure(figsize=(15, 6))
plt.plot(event_data.index, event_data.tickets_sold_sum)
plt.xlabel("Date")
plt.ylabel("Rentals")
以下是一些示例数据:https://docs.google.com/spreadsheets/d/1cJAcamytX4zmQBpbQZYIi-HK5T0JlAJ5Dx3b1D6adxQ/edit?usp=sharing
答案 0 :(得分:1)
这是我尝试过的:
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> df = pd.read_csv("d.csv")
>>> df
created event_id tickets_sold tickets_sold_sum
0 2019-05-22 18:01:38.425533+00 1 90 90
1 2019-05-22 18:02:17.867726+00 1 40 130
2 2019-05-22 18:02:32.44182+00 1 13 143
3 2019-05-22 18:03:07.093599+00 1 0 143
4 2019-05-22 18:03:22.857492+00 1 10 153
5 2019-05-22 18:04:07.453356+00 1 14 167
6 2019-05-22 18:04:24.382271+00 1 14 181
7 2019-05-22 18:04:34.670751+00 1 7 188
8 2019-05-22 18:05:04.781586+00 1 10 198
9 2019-05-22 18:05:28.475102+00 1 2 200
10 2019-05-22 18:05:41.469483+00 1 0 200
11 2019-05-22 18:06:04.184309+00 1 19 219
12 2019-05-22 18:06:07.344332+00 1 18 237
13 2019-05-22 18:06:21.596053+00 1 9 246
14 2019-05-22 18:06:29.980078+00 1 20 266
15 2019-05-22 18:06:36.33118+00 1 11 277
16 2019-05-22 18:06:46.557717+00 1 15 292
17 2019-05-22 18:06:50.681479+00 1 10 302
18 2019-05-22 18:07:07.288164+00 1 17 319
19 2019-05-22 18:07:12.296925+00 1 11 330
20 2019-05-22 18:07:42.836565+00 1 5 335
21 2019-05-22 18:07:56.903366+00 1 17 352
22 2019-05-22 18:09:03.798696+00 1 13 365
23 2019-05-22 18:09:20.485152+00 1 9 374
24 2019-05-22 18:10:22.913068+00 1 14 388
25 2019-05-22 18:10:30.922313+00 1 9 397
26 2019-05-22 18:11:36.149465+00 1 12 409
27 2019-05-22 18:11:45.23962+00 1 13 422
28 2019-05-22 18:11:48.826544+00 1 4 426
>>> df.set_index("created",inplace=True)
>>> df
event_id tickets_sold tickets_sold_sum
created
2019-05-22 18:01:38.425533+00 1 90 90
2019-05-22 18:02:17.867726+00 1 40 130
2019-05-22 18:02:32.44182+00 1 13 143
2019-05-22 18:03:07.093599+00 1 0 143
2019-05-22 18:03:22.857492+00 1 10 153
2019-05-22 18:04:07.453356+00 1 14 167
2019-05-22 18:04:24.382271+00 1 14 181
2019-05-22 18:04:34.670751+00 1 7 188
2019-05-22 18:05:04.781586+00 1 10 198
2019-05-22 18:05:28.475102+00 1 2 200
2019-05-22 18:05:41.469483+00 1 0 200
2019-05-22 18:06:04.184309+00 1 19 219
2019-05-22 18:06:07.344332+00 1 18 237
2019-05-22 18:06:21.596053+00 1 9 246
2019-05-22 18:06:29.980078+00 1 20 266
2019-05-22 18:06:36.33118+00 1 11 277
2019-05-22 18:06:46.557717+00 1 15 292
2019-05-22 18:06:50.681479+00 1 10 302
2019-05-22 18:07:07.288164+00 1 17 319
2019-05-22 18:07:12.296925+00 1 11 330
2019-05-22 18:07:42.836565+00 1 5 335
2019-05-22 18:07:56.903366+00 1 17 352
2019-05-22 18:09:03.798696+00 1 13 365
2019-05-22 18:09:20.485152+00 1 9 374
2019-05-22 18:10:22.913068+00 1 14 388
2019-05-22 18:10:30.922313+00 1 9 397
2019-05-22 18:11:36.149465+00 1 12 409
2019-05-22 18:11:45.23962+00 1 13 422
2019-05-22 18:11:48.826544+00 1 4 426
>>> plt.figure(figsize=(15, 6))
<Figure size 1500x600 with 0 Axes>
>>> plt.plot(df.index[:10], df.tickets_sold_sum[:10])
[<matplotlib.lines.Line2D object at 0x0000022C7FBF5898>]
>>> plt.xlabel("Date")
Text(0.5,0,'Date')
>>> plt.ylabel("Rentals")
Text(0,0.5,'Rentals')
>>> plt.show()
答案 1 :(得分:0)
这是您在绘制索引值而不是列created
时的问题,因此请使用:
plt.plot(event_data.created, event_data.tickets_sold_sum)
或使用熊猫进行绘图:
event_data.plot(x='created', y='tickets_sold_sum')
如果需要使用DatetimeIndex
进行搜索,请先创建它-例如通过参数index_col
和parse_dates
:
def load_event_data():
df = pd.read_csv('created.csv', index_col='created',parse_dates=['created'])
return df
event_data = load_event_data()
print (event_data.index)
DatetimeIndex(['2019-05-22 18:01:38.425533+00:00',
'2019-05-22 18:02:17.867726+00:00',
'2019-05-22 18:02:32.441820+00:00',
'2019-05-22 18:03:07.093599+00:00',
'2019-05-22 18:03:22.857492+00:00',
'2019-05-22 18:04:07.453356+00:00',
'2019-05-22 18:04:24.382271+00:00',
'2019-05-22 18:04:34.670751+00:00',
'2019-05-22 18:05:04.781586+00:00',
'2019-05-22 18:05:28.475102+00:00',
'2019-05-22 18:05:41.469483+00:00',
'2019-05-22 18:06:04.184309+00:00',
'2019-05-22 18:06:07.344332+00:00',
'2019-05-22 18:06:21.596053+00:00',
'2019-05-22 18:06:29.980078+00:00',
'2019-05-22 18:06:36.331180+00:00',
'2019-05-22 18:06:46.557717+00:00',
'2019-05-22 18:06:50.681479+00:00',
'2019-05-22 18:07:07.288164+00:00',
'2019-05-22 18:07:12.296925+00:00',
'2019-05-22 18:07:42.836565+00:00',
'2019-05-22 18:07:56.903366+00:00',
'2019-05-22 18:09:03.798696+00:00',
'2019-05-22 18:09:20.485152+00:00',
'2019-05-22 18:10:22.913068+00:00',
'2019-05-22 18:10:30.922313+00:00',
'2019-05-22 18:11:36.149465+00:00',
'2019-05-22 18:11:45.239620+00:00',
'2019-05-22 18:11:48.826544+00:00'],
dtype='datetime64[ns, UTC]', name='created', freq=None)
print("The defined index is", event_data.index.name)
# Visualize data
plt.figure(figsize=(15, 6))
plt.plot(event_data.index, event_data.tickets_sold_sum)
plt.xlabel("Date")
plt.ylabel("Rentals")
答案 2 :(得分:0)
您能两次从Postgres SELECT TO_CHAR(date, 'YYYYMMDD');
提取数据,一次训练模型,另一次以desireD时间格式绘制模型吗?