将CNN分类器模型更改为CNN回归模型

时间:2020-04-08 10:05:43

标签: python tensorflow machine-learning keras conv-neural-network

我正在尝试将CNN分类模型更改为CNN回归模型。分类模型有一些新闻声明作为输入,第二个变量是Index的变化(发布日的负收益为0,正面变化为1)。现在,我最终尝试将模型从分类更改为回归,以便我可以使用实际的收益而不是二进制分类。

所以我在神经网络中的输入看起来像这样:

                                                   document    VIX 1d
1999-05-18  Release Date: May 18, 1999\n\nFor immediate re... -0.010526
1999-06-30  Release Date: June 30, 1999\n\nFor immediate r... -0.082645
1999-08-24  Release Date: August 24, 1999\n\nFor immediate... -0.043144

(文档将在进入NN之前被标记化,只是您有一个例子)

到目前为止,我更改了以下参数: -损失函数现在是均方误差(之前是:二进制交叉熵),最后一层的激活现在是线性的(之前是:Sigmoid)以及要测量的mse指标(之前:acc)

下面您可以看到我的代码:

 all_words = [word for tokens in X for word in tokens]
   all_sentence_lengths = [len(tokens) for tokens in X]
   ALL_VOCAB = sorted(list(set(all_words)))
   print("%s words total, with a vocabulary size of %s" % (len(all_words), len(ALL_VOCAB)))
   print("Max sentence length is %s" % max(all_sentence_lengths))


####################### CHANGE THE PARAMETERS HERE #####################################
EMBEDDING_DIM = 300 # how big is each word vector
MAX_VOCAB_SIZE = 1893# how many unique words to use (i.e num rows in embedding vector)
MAX_SEQUENCE_LENGTH = 1086 # max number of words in a comment to use


tokenizer = Tokenizer(num_words=MAX_VOCAB_SIZE, lower=True, char_level=False)
tokenizer.fit_on_texts(change_df["document"].tolist())
training_sequences = tokenizer.texts_to_sequences(X_train.tolist())

train_word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(train_word_index))

train_embedding_weights = np.zeros((len(train_word_index)+1, EMBEDDING_DIM))
for word,index in train_word_index.items():
    train_embedding_weights[index,:] = w2v_model[word] if word in w2v_model else np.random.rand(EMBEDDING_DIM)
print(train_embedding_weights.shape)


######################## TRAIN AND TEST SET #################################
train_cnn_data = pad_sequences(training_sequences, maxlen=MAX_SEQUENCE_LENGTH)
test_sequences = tokenizer.texts_to_sequences(X_test.tolist())
test_cnn_data = pad_sequences(test_sequences, maxlen=MAX_SEQUENCE_LENGTH)

def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim, trainable=False, extra_conv=True):
    embedding_layer = Embedding(num_words,
                                embedding_dim,
                                weights=[embeddings],
                                input_length=max_sequence_length,
                                trainable=trainable)

    sequence_input = Input(shape=(max_sequence_length,), dtype='int32')
    embedded_sequences = embedding_layer(sequence_input)

    # Yoon Kim model (https://arxiv.org/abs/1408.5882)
    convs = []
    filter_sizes = [3, 4, 5]

    for filter_size in filter_sizes:
        l_conv = Conv1D(filters=128, kernel_size=filter_size, activation='relu')(embedded_sequences)
        l_pool = MaxPooling1D(pool_size=3)(l_conv)
        convs.append(l_pool)

    l_merge = concatenate([convs[0], convs[1], convs[2]], axis=1)

    # add a 1D convnet with global maxpooling, instead of Yoon Kim model
    conv = Conv1D(filters=128, kernel_size=3, activation='relu')(embedded_sequences)
    pool = MaxPooling1D(pool_size=3)(conv)

    if extra_conv == True:
        x = Dropout(0.5)(l_merge)
    else:
        # Original Yoon Kim model
        x = Dropout(0.5)(pool)
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    preds = Dense(1, activation='linear')(x)

    model = Model(sequence_input, preds)
    model.compile(loss='mean_squared_error',
                  optimizer='adadelta',
                  metrics=['mse'])
    model.summary()
    return model

x_train = train_cnn_data
y_tr = y_train
x_test = test_cnn_data

model = ConvNet(train_embedding_weights, MAX_SEQUENCE_LENGTH, len(train_word_index)+1, EMBEDDING_DIM, False)

#define callbacks
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=4, verbose=1)
callbacks_list = [early_stopping]

hist = model.fit(x_train, y_tr, epochs=5, batch_size=33, validation_split=0.1, shuffle=True, callbacks=callbacks_list)

y_tes=model.predict(x_test, batch_size=33, verbose=1)

有人知道在代码运行时我还应该更改什么,但是我认为结果很差。像运行代码一样,我得到以下结果:

Epoch 5/5

 33/118 [=======>......................] - ETA: 15s - loss: 0.0039 - mse: 0.0039
 66/118 [===============>..............] - ETA: 9s - loss: 0.0031 - mse: 0.0031 
 99/118 [========================>.....] - ETA: 3s - loss: 0.0034 - mse: 0.0034
118/118 [==============================] - 22s 189ms/step - loss: 0.0035 - mse: 0.0035 - val_loss: 0.0060 - val_mse: 0.0060

或者至少是我可以阅读某些内容的来源?我只是在网上找到了一些分类CNN,但实际上没有示例带有回归的NLP CNN。

非常感谢

卢卡斯

2 个答案:

答案 0 :(得分:0)

这是一个很好的例子。复制/粘贴代码,加载数据集;它应该回答您所有的问题。

# Classification with Tensorflow 2.0
import pandas as pd
import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt
# %matplotlib inline

import seaborn as sns
sns.set(style="darkgrid")

cols = ['price', 'maint', 'doors', 'persons', 'lug_capacity', 'safety', 'output']
cars = pd.read_csv(r'C:\\your_path\\cars_dataset.csv', names=cols, header=None)

cars.head()


price = pd.get_dummies(cars.price, prefix='price')
maint = pd.get_dummies(cars.maint, prefix='maint')

doors = pd.get_dummies(cars.doors, prefix='doors')
persons = pd.get_dummies(cars.persons, prefix='persons')

lug_capacity = pd.get_dummies(cars.lug_capacity, prefix='lug_capacity')
safety = pd.get_dummies(cars.safety, prefix='safety')

labels = pd.get_dummies(cars.output, prefix='condition')

# To create our feature set, we can merge the first six columns horizontally:

X = pd.concat([price, maint, doors, persons, lug_capacity, safety] , axis=1)

# Let's see how our label column looks now:

labels.head()


y = labels.values

# The final step before we can train our TensorFlow 2.0 classification model is to divide the dataset into training and test sets:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Model Training

# To train the model, let's import the TensorFlow 2.0 classes. Execute the following script:

from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model


# The next step is to create our classification model:
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(15, activation='relu')(input_layer)
dense_layer_2 = Dense(10, activation='relu')(dense_layer_1)
output = Dense(y.shape[1], activation='softmax')(dense_layer_2)

model = Model(inputs=input_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])


# The following script shows the model summary:

print(model.summary())


# Result:

# Model: "model"
# Layer (type)                 Output Shape              Param #   


# Finally, to train the model execute the following script:
history = model.fit(X_train, y_train, batch_size=8, epochs=50, verbose=1, validation_split=0.2)


# Result:

# Train on 7625 samples, validate on 1907 samples
# Epoch 1/50
# - 4s 492us/sample - loss: 3.0998 - acc: 0.2658 - val_loss: 12.4542 - val_acc: 0.0834


# Let's finally evaluate the performance of our classification model on the test set:

score = model.evaluate(X_test, y_test, verbose=1)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])


# Result: 



# Regression with TensorFlow 2.0

petrol_cons = pd.read_csv(r'C:\\your_path\\gas_consumption.csv')

# Let's print the first five rows of the dataset via the head() function:

petrol_cons.head()


X = petrol_cons.iloc[:, 0:4].values
y = petrol_cons.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


# Model Training

# The next step is to train our model. This is process is quite similar to training the classification. The only change will be in the loss function and the number of nodes in the output dense layer. Since now we are predicting a single continuous value, the output layer will only have 1 node.

input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(100, activation='relu')(input_layer)
dense_layer_2 = Dense(50, activation='relu')(dense_layer_1)
dense_layer_3 = Dense(25, activation='relu')(dense_layer_2)
output = Dense(1)(dense_layer_3)

model = Model(inputs=input_layer, outputs=output)
model.compile(loss="mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])


# Finally, we can train the model with the following script:

history = model.fit(X_train, y_train, batch_size=2, epochs=100, verbose=1, validation_split=0.2)


# Result:

# Train on 30 samples, validate on 8 samples
# Epoch 1/100


# To evaluate the performance of a regression model on test set, one of the most commonly used metrics is root mean squared error. We can find mean squared error between the predicted and actual values via the mean_squared_error class of the sklearn.metrics module. We can then take square root of the resultant mean squared error. Look at the following script:

from sklearn.metrics import mean_squared_error
from math import sqrt

pred_train = model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))


# Result:

# 57.398156439652396


pred = model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred)))


# Result:

# 86.61012708343948


# https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
# datasets:
# https://www.kaggle.com/elikplim/car-evaluation-data-set


# for OLS analysis
import statsmodels.api as sm

model = sm.OLS(y, X)
results = model.fit()
print(results.summary())

# Results:
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.987
Model:                            OLS   Adj. R-squared (uncentered):              0.986
Method:                 Least Squares   F-statistic:                              867.8
Date:                Thu, 09 Apr 2020   Prob (F-statistic):                    3.17e-41
Time:                        13:13:11   Log-Likelihood:                         -269.00
No. Observations:                  48   AIC:                                      546.0
Df Residuals:                      44   BIC:                                      553.5
Df Model:                           4                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1           -14.2390      8.414     -1.692      0.098     -31.196       2.718
x2            -0.0594      0.017     -3.404      0.001      -0.095      -0.024
x3             0.0012      0.003      0.404      0.688      -0.005       0.007
x4          1630.8913    130.969     12.452      0.000    1366.941    1894.842
==============================================================================
Omnibus:                        9.750   Durbin-Watson:                   2.226
Prob(Omnibus):                  0.008   Jarque-Bera (JB):                9.310
Skew:                           0.880   Prob(JB):                      0.00952
Kurtosis:                       4.247   Cond. No.                     1.00e+05
==============================================================================

数据源:

https://www.kaggle.com/elikplim/car-evaluation-data-set

https://drive.google.com/file/d/1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_/view

答案 1 :(得分:0)

也许还有两个问题: 1.回归均方根误差的数值很高。 (57.39和86.61)&我得到(对于我的数据集)0.0851(火车)和0.1169(测试)。看来我的价值观很好,对吧?均方根误差越低越好?我很早以前参加过统计课...:D 2.您甚至可能知道(或也许有一个例子)我将如何在神经网络的回归中实现另一个变量?就我而言,我有文本数据和想要预测的返回值。我也想包括一些宏观经济(控制)变量。 谢谢!