我有一个时序数据帧,看起来像:
A =
date,uuid,diesel,e5,e10
2018-01-31 00:01:06+01,c03c846e-64ec-437f-9a52-9eda8088c4b2,1.239,1.419,1.399
2018-01-31 00:03:06+01,6dc575da-3c85-430c-a17a-6efdae0dcf5a,1.249,1.419,1.399
其中date是索引和(可选地解析为datetime)。
数据集非常大(> 100.000.000行),其中包含大约15.000个唯一的uuid
。
我想绘制出每个uuid(=加油站)的价格(柴油,e10,e5)随时间的变化或只是随机抽样的数字(如10或100)。
目前,我正在使用循环来执行此操作,但是由于熊猫中的循环非常慢,我想知道是否有一种更快的也许是矢量化的技术:
for count,uuid in enumerate(dataframe.uuid):
x = dataframe.loc[dataframe.uuid == uuid].index
# diesel
ax1.plot(x, dataframe.loc[dataframe.uuid == uuid].diesel)
# e10
ax2.plot(x, dataframe.loc[dataframe.uuid == uuid].e10)
# e5
ax3.plot(x, dataframe.loc[dataframe.uuid == uuid].e5)
if count >= cap-1:
break
plt.show()
编辑:
在按uuid
和date
正确分组之后,数据集看起来很有希望实现我想做的事情:dataframe.groupby(['uuid','date']).sum()[['diesel','e10','e5']]
diesel e10 e5
station_uuid date
00006210-0037-4444-8888-acdc00006210 2018-01-01 06:33:06 1.189 1.369 1.389
2018-01-01 06:39:05 1.189 1.329 1.349
2018-01-01 09:39:07 1.189 1.319 1.339
...
我现在如何绘制所有uuid
或选定数量的train_A.shape : (3000,3)
train_B.shape : (1000,3)
train_y.shape : (1000,1)
val_A.shape : (900,3)
val_B.shape : (300,3)
val_y.shape : (300,1)
test_A.shape : (900,3)
test_B.shape : (300,3)
test_y.shape : (300,1)
re_train_A = train_A.reshape(-1, 3, 3, 1)
re_val_A = val_A.reshape(-1, 3, 3, 1)
re_test_A = test_A.reshape(-1, 3, 3, 1)
conv_input = Input(shape= (3, 3, 1), name = 'input_A')
conv_model = Conv2D(filters=10, kernel_size=(3,1), init='glorot_uniform',
activation='relu')(conv_input)
conv_model = Flatten()(conv_model)
fnn_input = Input(shape= (3, ), name = 'input_B')
fnn_model = Dense(10, init='glorot_uniform',activation='relu')(fnn_input)
merged_model = concatenate([conv_model, fnn_model])
merged_model = Dense(32, init='glorot_uniform',activation='relu')(merged_model)
total_ouput = Dense(1, init='glorot_uniform',activation='relu')(merged_model)
model = Model(inputs=[conv_input, fnn_input], outputs=[total_ouput])
opt = optimizers.Adam(lr = 0.001)
model.compile(optimizer=opt, loss='mse')
hist = model.fit({'input_A': re_train_A, 'input_B': train_B},
y= train_y,
validation_data = ({'input_A': re_val_A , 'input_B': val_B}, val_y),
epochs=500, batch_size=256)
随时间的价格变化?
答案 0 :(得分:0)
import matplotlib.pyplot as plt
plt.figure(1)
grouped_dfs = dataframe.groupby('uuid')
plt.subplot(311)
grouped_dfs.plot.line(x='date', y='diesel', color='blue')
plt.subplot(312)
grouped_dfs.plot.line(x='date', y='e10', color='red')
plt.subplot(313)
grouped_dfs.plot.line(x='date', y='e5', color='yellow')
plt.show()
没有任何可玩的数据,这是我无法解决的解决方案