`df` 的模型（以月为单位）：

Question

提供的数据以小时为单位。

截至目前，我只在数据框中选择了三个参数。

输入：

df.info()

输出：

    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 78888 entries, 2006-01-01 00:00:00 to 2014-12-31 23:00:00
    Freq: H
    Data columns (total 3 columns):
     #   Column           Non-Null Count  Dtype  
    ---  ------           --------------  -----  
     0   Temperature (C)  78888 non-null  float64
     1   Humidity         78888 non-null  float64
     2   Visibility (km)  78888 non-null  float64
    dtypes: float64(3)
    memory usage: 2.4 MB

其中一个参数的图形如下。

    df['Temperature (C)'].plot(figsize=(30,8))

首先我想到将 Dataframe 从每小时数据更改为每月数据，这样训练起来会更容易。

    df = df.resample('MS').mean()

像这样，

月份温度的变化是，

df['Temperature (C)'].plot(figsize=(30,8))

测试和训练：

    rows_per_month=1
    test_months = 18 #number of months we want to predict in the future.
    
    test_indices = test_months*rows_per_month
    test_indices
    
    # train and test split:
    train = df_final.iloc[:-test_indices]
    
    # Choose the variable/parameter you want to predict
    test = df_final.iloc[-test_indices:]

我使用来自 sci-kit learn

的 MinMaxScaler 缩放数据

发电机参数：

    length =  12*rows_per_month #Length of output sequences (in number of timesteps)
    batch_size = 1 #Number of timeseries sample in batch
    generator = tf.keras.preprocessing.sequence.TimeseriesGenerator(scaled_train,scaled_train,length=length,batch_size=batch_size)

`df` 的模型（以月为单位）：

    # define model
    model = Sequential()
    
    model.add(tf.keras.layers.LSTM(50, input_shape=(length,scaled_train.shape[1]),return_sequences=True))
    model.add(tf.keras.layers.LSTM(50))
    
    
    #NOTE: Do not specify the activation function for LSTM layers, this is because it will not run on GPU.
    model.add(Dense(scaled_train.shape[1]))
    
    model.compile(optimizer='adam', loss='mse')

该模型训练了 24 个 epoch，并且在预测以下三个参数方面做得相当好。

来自模型的预测（当 `df` 为每月时）：

这是一个相当不错的预测。

<块引用>

问题是当我增加数据的密度并使它每天，而不是每月。

我使用了原始数据并执行了以下操作：

    df = df.resample('D').mean()

以天为单位的温度变化：

    df['Temperature (C)'].plot(figsize=(30,8))

测试和训练：

这里唯一改变的是rows_per_month = 30，其余的一切都一样。

发电机参数：

也同上。

`df` 的模型（以天为单位）：

相同的模型（正如我在 df 月份时使用的那样）

    model = Sequential()
    
    model.add(tf.keras.layers.LSTM(50, input_shape=(length,scaled_train.shape[1]),return_sequences=True))
    model.add(tf.keras.layers.LSTM(50))
    # model.add(tf.keras.layers.LSTM(50)) #add this layer if df is in 'days'
    
    #NOTE: Do not specify the activation function for LSTM layers, this is because it will not run on GPU.
    model.add(Dense(scaled_train.shape[1]))
    
    model.compile(optimizer='adam', loss='mse')

模型训练了 24 个 epochs，但模型没有正确预测。

损失：

来自模型的预测（当 `df` 为每日时）

我尝试再添加一层 50 LSTM 单元。

    model = Sequential()
    
    model.add(tf.keras.layers.LSTM(50, input_shape=(length,scaled_train.shape[1]),return_sequences=True))
    model.add(tf.keras.layers.LSTM(50, return_sequences=True))
    model.add(tf.keras.layers.LSTM(50)) #add this layer if df is in 'days'
    
    #NOTE: Do not specify the activation function for LSTM layers, this is because it will not run on GPU.
    model.add(Dense(scaled_train.shape[1]))
    
    model.compile(optimizer='adam', loss='mse')

但结果是相似的。

我还尝试将模型训练更多时期（约 100 个），但没有得到结果。

我想我遗漏了一个关键点，数据的周期性保持不变，只是点的密度发生了变化，为什么这会影响模型的准确性？

如何从预测每日参数的模型中获得不错的准确性？以及每小时数据？

为多元预测创建 LSTM 模型时遇到问题

测试和训练：

发电机参数：

`df` 的模型（以月为单位）：

来自模型的预测（当 `df` 为每月时）：

测试和训练：

发电机参数：

`df` 的模型（以天为单位）：

损失：

来自模型的预测（当 `df` 为每日时）

0 个答案:

为多元预测创建 LSTM 模型时遇到问题

测试和训练：

发电机参数：

df 的模型（以月为单位）：

来自模型的预测（当 df 为每月时）：

测试和训练：

发电机参数：

df 的模型（以天为单位）：

损失：

来自模型的预测（当 df 为每日时）

0 个答案:

`df` 的模型（以月为单位）：

来自模型的预测（当 `df` 为每月时）：

`df` 的模型（以天为单位）：

来自模型的预测（当 `df` 为每日时）